报告嘉宾2:王利民(苏黎世联邦理工学院) 报告时间:2017年3月29日(星期三)晚上21:00 报告题目:Towards efficient end-to-end architectures for action recognition and detection in videos 主持人:杨猛(深圳大学) 报告摘要: Action understanding is becoming an increasingly important topic in video analysis and much progress has been made in the recent years with deep learning methods. In this talk, I will present our three approaches to temporal modeling with deep networks for action recognition and detection in videos. Foremost, I would talk about a conceptually simple, general, and flexible framework for action recognition, known as temporal segment network (TSN). TSN is an efficient video-level architecture based on a novel segment sampling and aggregating strategy for long-range temporal modeling. It helps us to achieve the state-of-the-art performance on standard action recognition benchmarks (e.g., HMDB51, UCF101) and win the ActivityNet challenge 2016. Then, I will describe our latest temporal action detection framework, the structured segment network (SSN). SSN introduces structural and context modeling into the TSN framework and obtains the best performance on the action detection benchmarks such as THUMOS14 and ActivityNet. Finally, I will present the UntrimmedNet, which directly learns from untrimmed videos for weakly supervised action recognition and detection. UntrimmedNet extends the TSN framework to a weakly supervised setting by leveraging the attention mechanism. All these three approaches share the same merits of temporal modeling and efficient processing brought by TSN, which prove crucial for video analysis in real world scenarios. 参考文献: [1] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, in ECCV, 2016. [2] L. Wang, Y. Xiong, D. Lin, and L. Van Gool, UntrimmedNets for Weakly Supervised Action Recognition and Detection, in CVPR 2017. [3] Y. Xiong, Y Zhao, L. Wang, D. Lin, X. Tang, A Pursuit of Temporal Accuracy in General Activity Detection, arXiv:1703.02716 报告人简介: 王利民于2011年在南京大学获得学士学位,2015年在香港中文大学获得博士学位,现在是苏黎世联邦理工学院从事博士后研究工作。他的研究兴趣包括计算机视觉和深度学习,具体关注人体行为理解和场景识别。他和他的合作者曾是2015年ImageNet竞赛的场景识别亚军,2016年ActivityNet竞赛的视频分类冠军。 特别鸣谢本次Webinar主要组织者: VOOC责任委员:朱鹏飞(天津大学) VODB协调理事:左旺孟(哈尔滨工业大学),邓成(西安电子科技大学) |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2024-11-23 06:51 , Processed in 0.012595 second(s), 15 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.