VALSE Webinar 20230830-22期总第322期视频分割的研究进展

2023-8-25 10:20| 发布者: 程一-计算所| 查看: 832| 评论: 0

摘要: 报告嘉宾：杨宗鑫 (浙江大学)报告题目：Segment and Track Anything报告嘉宾：李祥泰 (南洋理工大学)报告题目：Towards Unified and Efficient Pixel-wised Video Perception报告嘉宾：杨宗鑫 (浙江大学)报告时间：2 ...

报告嘉宾：杨宗鑫 (浙江大学)

报告题目：Segment and Track Anything

报告嘉宾：李祥泰 (南洋理工大学)

报告题目：Towards Unified and Efficient Pixel-wised Video Perception

报告嘉宾：杨宗鑫 (浙江大学)

报告时间：2023年08月30日 (星期三)晚上20:00 (北京时间)

报告题目：Segment and Track Anything

报告人简介：

杨宗鑫，浙江大学计算机科学与技术学院博士后研究员。2021年于悉尼科技大学获博士学位。研究方向为计算机视觉，包括视频理解、三维视觉和视觉内容生成。在计算机视觉国际顶级会议 (NeurIPS、CVPR、ICCV、ECCV、ICLR等)和期刊 (TPAMI、TIP等)上发表20余篇论文。在视觉分割、跟踪等领域顶级国际学术会议的竞赛中获奖十余次，含7次世界冠军，包括EPIC-Kitchens 2023挑战赛 (CVPR 2023)视频跟踪冠军、视频分割冠军、VOT 2022视频目标跟踪挑战赛 (ECCV 2022)两项赛道冠军等。

个人主页：

https://z-x-yang.github.io

报告摘要：

视频分割是视觉领域重要的基础任务，其目标在于从像素级别精细地理解视频中的时空信息。本次报告将介绍我们在视频分割、跟踪、跨模态感知等方向的一系列工作。报告最后将介绍结合SAM图像分割大模型与视频目标分割小模型的开源项目，Segment and Track Anything，以及展示其相关应用前景。

参考文献：

[1] Yang, Zongxin, Yunchao Wei, and Yi Yang. "Collaborative video object segmentation by foreground-background integration." ECCV, 2020.

[2] Yang, Zongxin, Yunchao Wei, and Yi Yang. "Collaborative video object segmentation by multi-scale foreground-background integration." TPAMI, 2021.

[3] Yang, Zongxin, Yunchao Wei, and Yi Yang. "Associating objects with transformers for video object segmentation." NeurIPS, 2021.

[4] Zhu, Feng, et al. "Instance as identity: A generic online paradigm for video instance segmentation." ECCV, 2022.

[5] Yang, Zongxin, and Yi Yang. "Decoupling features in hierarchical propagation for video object segmentation." NeurIPS, 2022.

[6] Li, Kexin, et al. "CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation." ACM MM, 2023.

[7] Xu, Yuanyou, Zongxin Yang, and Yi Yang. "Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation." ICCV, 2023.

[8] Cheng, Yangming, et al. "Segment and track anything." arXiv preprint arXiv:2305.06558 (2023).

报告嘉宾：李祥泰 (南洋理工大学)

报告时间：2023年08月30日 (星期三)晚上20:30 (北京时间)

报告题目：Towards Unified and Efficient Pixel-wised Video Perception

报告人简介：

李祥泰，南洋理工大学博士后研究员 (MMLab@NTU)。2022年博士毕业于北京大学智能学院，是校级与北京市优秀毕业生。主要研究方向包括：图像分割与检测、多模态学习和视频理解，专注于让智能机器真正理解各种复杂的场景输入。在计算机视觉国际顶级会议 (CVPR、ICCV、ECCV、ICLR、NeurIPS等)和期刊 (TPAMI、IJCV、TIP等)上发表20余篇论文。读博期间在商汤京东多家企业做科研实习生，获得北大校长奖学金、国家奖学金，部分研究成果应用到了实习单位的产品中。

Dr.Xiangtai Li is a postdoctoral researcher at MMLab, Nanyang Technological University (NTU). He received his Ph.D. in 2022 from the School of Intelligent Systems at Peking University and was honored as an outstanding graduate at both the university and city (Beijing) levels. His main research areas include image segmentation and detection, multimodal learning, and video understanding, with a focus on enabling intelligent machines to truly comprehend various complex scene inputs. He has published over 20 papers in top international conferences in computer vision (such as CVPR, ICCV, ECCV, ICLR, NeurIPS, etc.) and journals (such as TPAMI, IJCV, TIP, etc.). During his doctoral studies, he interned as a research intern in several companies (SenseTime，JD), won the Peking University President Scholarship, the National Scholarship, and some of his research achievements were applied to the products of his internship companies.

个人主页：

https://lxtgh.github.io/

报告摘要：

Pixel-wised video understanding tasks including segmentation/ detection are essential for machine to better understand dynamic world. However, in video perception, due to various task definition and multi-task output requirements, most methods are specific designed for each task, which makes them hard to unify or benefit each sub-task.

In this talk, I will introduce several explorations in my research to build unified and efficient video perception systems. I will start with TransVOD, a unified and high performance DETR-Like video object detector. Then, I introduce PolyphonicFormer, a unified panoptic and depth prediction framework. It is also the winner of ICCV-2021-BMTT workshop. Next, I will introduce Video K-Net, an online kernel-based video panoptic segmentation framework. Finally, I will highlight the recent work, Tube-Link, a universal near-online approach for video segmentation.

参考文献：

[1] Qianyu Zhou*, Xiangtai Li*, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao. TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers, T-PAMI-2023.

[2] Haobo Yuan*, Xiangtai Li*, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, Dacheng Tao. PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation, ECCV-2022.

[3] Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy. Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation, CVPR-2022.

[4] Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy. Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation, ICCV-2023.

主持人：王文冠 (浙江大学)

主持人简介：

王文冠，浙江大学计算机学院百人计划研究员，博士生导师，国家优秀青年基金 (海外)获得者。2022∼2023年, 任悉尼科技大学 (University of Technology Sydney)讲师。2020∼2022 年, 任苏黎世联邦理工学院 (ETH Zurich)博后研究员。2018∼2019年，先后任起源人工智能研究院 (IIAI)研究员和资深研究员。2016∼2018年在加州大学洛杉矶分校 (UCLA)访学。2018年博士毕业于北京理工大学。主要研究方向为计算机视觉和人工智能。在顶级期刊和会议 (如TPAMI、IJCV、ICLR、ICML、NeurIPS、CVPR、ICCV、ECCV、AAAI、Siggraph sia)发表学术论文80多篇。谷歌学术引用13000余次，H指数60。曾获澳大利亚研究理事会 (Australian Research Council，ARC)优秀青年基金 (Discovery Early Career Researcher Award，DECRA) (2022年)、斯坦福大学“全球前2%顶尖科学家” (2022年)，Elsevier高被引中国学者 (2020∼2022年)，世界人工智能大会优秀青年论文奖 (2020年)、中国人工智能学会优博奖 (2019年)，ACM中国优博奖 (2018年)。带队在 15个国际学术竞赛中获得7项冠军、3项亚军和5项季军。

个人主页：

https://sites.google.com/view/wenguanwang/home

特别鸣谢本次Webinar主要组织者：

主办AC：王文冠 (浙江大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ S群，群号：317920537）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。