18-19期VALSE Webinar会后总结

程一-计算所 · 发表于 2018-7-13 18:30:13

德国马克斯普朗克研究所汤思宇博士2018年7月4日VALSE Webinar 成功举办。

Dr. Siyu Tang is a research group leader in the Department of Perceiving Systems at the Max Planck Institute for Intelligent Systems, Germany.

She was a postdoctoral researcher at the Max Planck Institute for Intelligent Systems, advised by Michael Black. She finished her PhD (summa cum laude) at the Max Planck Institute for Informatics, under the supervision of Prof. Bernt Schiele. Before that, she received Master degree in Computer Science at RWTH Aachen University, advised by Prof. Bastian Leibe and Bachelor degree in the Computer Science and Technology Department at Zhejiang University, China. She was a research intern at the National Institute of Informatics, under the supervision of Prof. Helmut Prendinger.

Her research concerns the intersection between computer vision and machine learning with a focus on holistic visual scene understanding. In particular, she is interested in analyzing and modeling people in our complex visual scenes.

The title of her talk is: Graph Decomposition for People Tracking and Pose Estimation.

Understanding people in images and videos is a problem studied intensively in computer vision. While continuous progress has been made, occlusions, cluttered background, complex poses and large variety of appearance remain challenging, especially for crowded scenes. In this talk, I will explore the algorithms and tools that enable computer to interpret people's position, motion and articulated poses in complex visual scenes. More specifically, I will discuss an optimization problem whose feasible solutions relate one-to-one to the decompositions of a graph. I will highlight the applications of this problem in computer vision, which range from multi-person tracking to motion segmentation. I will also cover an extended optimization problem whose feasible solutions define a decomposition of a given graph and a labeling of its nodes with the application on multi-person pose estimation.

The references of the talk is listed here: End-to-end Learning for Graph Decomposition. Multi-person Tracking with Lifted Multicut and Person Re-ID. CVPR 17. Articulated Multi-person Tracking in the Wild. CVPR 17.

问答部分：

问题1：请问您刚才提到用MOT来提升reid feature的效果，请问具体如何做到呢？

回答：我们是用person re-identification 中优秀的方法来提升 multi-target tracking 的performance。具体说就是把reid的结果转化成tracking graph中的edge weight，从而能够把长时间遮挡的人的detection给连接起来，提升tracking的performance。

问题2：请问采用图割的方法能保证实时性吗？

回答：我们的方法不是实时的。是全局优化，也是online的。如果用ilp来做优化，在复杂的video中，是非常慢的。我们利用了heuristic solution，可以在较短的时间内得到tracking效果比较好的解。

问题3：有没有考虑用2nd order graph matching 来提高temporal asscociation准确度？

回答：有考虑这样的模型，我们用过更高的order来model temporal association。在toy example中association的提升是比较大的，但是由于当时没有快速的heuristic solver，我们没有办法在实际的task中model这样的关系。

问题4：您所做的pose estimation 和pose tracking方法均是CRF-RNN框架下进行的，那如何具体训练呢？是采用多阶段方法吗？另外，在语义分割任务上，近期的研究表明，不适用CRF效果甚至超过了CRF，在您所做的研究中，是否有类似发现？

回答：我们是采用多阶段训练，先训练front end cnn，再把crf加上去一起训练。在我们的pose estimation的实验中，加入crf对最后的效果是有提升的。

录像视频在线观看地址：

http://www.iqiyi.com/u/2289191062

PPT下载地址：

http://vision.ouc.edu.cn/valse/slides/20180704/Webinar20180704.pdf

特别鸣谢本次Webinar主要组织者：

VOOC责任委员：冯如意（中国地质大学）

VODB协调理事：左旺孟（哈尔滨工业大学）

活动参与方式：

1、VALSE Webinar活动依托在线直播平台进行，活动时讲者会上传PPT或共享屏幕，听众可以看到Slides，听到讲者的语音，并通过聊天功能与讲者交互；

2、为参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ群（目前A、B、C、D、E、F、G群已满，除讲者等嘉宾外，只能申请加入VALSE H群，群号：701662399）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、在活动开始前5分钟左右，讲者会开启直播，听众点击直播链接即可参加活动，支持安装Windows系统的电脑、MAC电脑、手机等设备；

4、活动过程中，请不要说无关话语，以免影响活动正常进行；

5、活动过程中，如出现听不到或看不到视频等问题，建议退出再重新进入，一般都能解决问题；

6、建议务必在速度较快的网络上参加活动，优先采用有线网络连接；

7、VALSE微信公众号会在每周一推送上一周Webinar报告的总结及视频（经讲者允许后），每周四发布下一周Webinar报告的通知及直播链接。