VALSE 首页 活动通知 好文作者面授招 查看内容

20170329-05 王利民:Towards efficient end-to-end architectures for...

2017-3-27 13:53| 发布者: 程一-计算所| 查看: 6332| 评论: 0

摘要: 报告嘉宾2:王利民(苏黎世联邦理工学院)报告时间:2017年3月29日(星期三)晚上21:00报告题目:Towards efficient end-to-end architectures for action recognition and detection in videos主持人:杨猛(深圳大 ...



报告题目:Towards efficient end-to-end architectures for action recognition and detection in videos 



Action understanding is becoming an increasingly important topic in video analysis and much progress has been made in the recent years with deep learning methods. In this talk, I will present our three approaches to temporal modeling with deep networks for action recognition and detection in videos. Foremost, I would talk about a conceptually simple, general, and flexible framework for action recognition, known as temporal segment network (TSN). TSN is an efficient video-level architecture based on a novel segment sampling and aggregating strategy for long-range temporal modeling. It helps us to achieve the state-of-the-art performance on standard action recognition benchmarks (e.g., HMDB51, UCF101) and win the ActivityNet challenge 2016. Then, I will describe our latest temporal action detection framework, the structured segment network (SSN). SSN introduces structural and context modeling into the TSN framework and obtains the best performance on the action detection benchmarks such as THUMOS14 and ActivityNet. Finally, I will present the UntrimmedNet, which directly learns from untrimmed videos for weakly supervised action recognition and detection. UntrimmedNet extends the TSN framework to a weakly supervised setting by leveraging the attention mechanism. All these three approaches share the same merits of temporal modeling and efficient processing brought by TSN, which prove crucial for video analysis in real world scenarios.


[1] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, in ECCV, 2016.

[2] L. Wang, Y. Xiong, D. Lin, and L. Van Gool, UntrimmedNets for Weakly Supervised Action Recognition and Detection, in CVPR 2017.

[3] Y. Xiong, Y Zhao, L. Wang, D. Lin, X. Tang, A Pursuit of Temporal Accuracy in General Activity Detection, arXiv:1703.02716








小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2021-4-14 07:31 , Processed in 0.012744 second(s), 15 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.