VALSE Webinar 20220907-23期总第290期自动驾驶感知

2022-9-1 16:42| 发布者: 程一-计算所| 查看: 2370| 评论: 0

摘要: 报告时间2022年09月07日 (星期三)晚上20:00 (北京时间)主题自动驾驶感知: Scene Understanding in Autonomous Driving主持人马超 (上海交通大学)直播地址https://live.bilibili.com/22300737报告嘉宾：李鸿升 (香港 ...

报告时间	2022年09月07日 (星期三) 晚上20:00 (北京时间)
主题	自动驾驶感知: Scene Understanding in Autonomous Driving
主持人	马超 (上海交通大学)
直播地址	https://live.bilibili.com/22300737

报告嘉宾：李鸿升 (香港中文大学)

报告题目：MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

报告嘉宾：Fisher Yu (苏黎世联邦理工)

报告题目：High-Quality 4D Scene Understanding in Autonomous Driving

Panel嘉宾：

李鸿升 (香港中文大学)、Fisher Yu (苏黎世联邦理工)、沈春华 (浙江大学)、王乃岩 (图森未来)、许春景 (华为技术有限公司)

Panel议题：

1. 在自动驾驶应用场景下，感知任务距离solved problem还有多远？现有的SOTA算法还有什么不足需要继续深挖？学术界、工业界和用户对自动驾驶感知任务性能的评价有何不同理解？

2. 面向真实应用场景，学术界只能利用公开的自动驾驶数据集设计算法，但是不同数据集之间的差异特别大，这种差异性对真实应用场景的影响在哪里？是数据问题还是算法问题？

3. 学术界需要什么样的新benchmark？自动驾驶场景感知里的小样本问题 (长尾问题)如何定义？Open-set问题又该如何定义？

4. 当前计算机视觉的一些研究热点，比如可解释性和对抗学习，在自动驾驶感知任务上有哪些帮助？如何提升现有自动驾驶感知系统的可靠性与安全性？

5. 如何看待在端到端自动驾驶系统里感知方法的贡献？

6. 各位老师在各自熟悉的领域，对有志于从事无人驾驶感知的同学们在研究方向上有何具体建议?

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：李鸿升 (香港中文大学)

报告时间：2022年09月07日 (星期三)晚上20:00 (北京时间)

报告题目：MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

报告人简介：

李鸿升，现任香港中文大学多媒体实验室副教授和西安电子科技大学“华山学者”讲座教授，研究方向为计算机视觉、深度学习以及医学图像处理。他在计算机视觉和医学图像处理顶级会议 (CVPR/ ICCV/ ECCV/ NeurlPS/ MICCAI)已经发表论文100余篇，Google Scholar引用超过21,000次。他作为团队负责人参加ImageNet 2016大赛，获得了视频物体检测第一名。他获得了2020年IEEE电路与系统协会杰出青年作者奖，2022年AI 2000计算机视觉全球最具影响力学者提名奖，2021年CUHK青年学者研究成就奖等奖项。他担任过NeurIPS 2021、2022领域主席和Neurocomputing的副编辑。

个人主页：

https://www.ee.cuhk.edu.hk/~hsli/

报告摘要：

Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. We present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. The three hierarchies conduct per-frame feature encoding, short-clip feature fusion, and whole-sequence feature aggregation, respectively. To enable processing long-sequence point clouds with reasonable computational resources, intra-group feature mixing and inter-group feature attention are proposed to form the second and third feature encoding hierarchies, which are recurrently applied for aggregating multi-frame trajectory features. The proxy points not only act as consistent object representations for each frame, but also serve as the courier to facilitate feature interaction between frames. The experiments on large Waymo Open dataset show that our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame)and long (e.g., 16-frame)point cloud sequences. Specifically, MPPNet achieves 74.21%, 74.62% and 73.31% for vehicle, pedestrian and cyclist classes on the LEVEL 2 mAPH metric with 16-frame input.

参考文献：

[1] X. Chen, S. Shi, B. Zhu, K. C. Cheung, H. Xu, H. Li. "MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection," European Conference on Computer Vision (ECCV), 2022.

报告嘉宾：Fisher Yu (苏黎世联邦理工)

报告时间：2022年09月07日 (星期三)晚上20:30 (北京时间)

报告题目：High-Quality 4D Scene Understanding in Autonomous Driving

报告人简介：

Fisher Yu is an Assistant Professor at ETH Zürich in Switzerland. He obtained his Ph.D. degree from Princeton University and became a postdoctoral researcher at UC Berkeley. He is now leading the Visual Intelligence and Systems (VIS) group at ETH Zürich. His goal is to build perceptual systems capable of performing complex tasks in complex environments. His research is at the junction of machine learning, computer vision and robotics. He currently works on closing the loop between vision and action. His works on image representation learning and large-scale datasets, especially dilated convolutions and the BDD100K dataset, have become essential parts of computer vision research.

个人主页：

https://www.yf.io

报告摘要：

Understanding semantics and motion in dynamic 3D scenes is foundational for autonomous driving. The recent availability of large-scale driving video datasets creates new research possibilities in this broad area. In this talk, I will illustrate the trend through the lens of object tracking, the essential building block for dynamic scene understanding. I will start with our recent findings in multiple object tracking (MOT), after briefly reviewing the current works and trends on the topic. Then, I will introduce our new tracking method based on Quasi-Dense Similarity Learning. Our method is conceptually more straightforward yet more effective than the previous works. It boasts almost ten percent of accuracy on the Waymo MOT dataset. I will also talk about how to use the 2D tracking method for monocular 3D object tracking and video instance segmentation. Our quasi-dense 3D tracking pipeline achieves impressive improvements on the nuScenes 3D tracking benchmark and we are still making fast progress. On the video segmentation side, we find that our algorithms can even beat the manual labeling accuracy.

参考文献：

[1] J. Pang , L. Qiu, X. Li, H. Chen, Q. Li, T. Darrell, F. Yu, Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021

[2] H. Hu, Q.-Z. Cai, D. Wang, J. Lin, M. Sun, P. Krähenbühl, T. Darrell, F. Yu, Joint Monocular 3D Vehicle Detection and Tracking, ICCV 2019

[3] H.-N. Hu, Y.-H. Yang, T. Fischer, T. Darrell, F. Yu, M. Sun, Monocular Quasi-Dense 3D Object Tracking, TPAMI 2022

[4] L. Ke, X. Li, M. Danelljan, Y.-W. Tai, C.-K. Tang, F. Yu, Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021

[5] L. Ke, M. Danelljan, X. Li, Y.-W. Tai, C.-K. Tang, F. Yu, Mask Transfiner for High-Quality Instance Segmentation, CVPR 2022

[6] L. Ke, H. Ding, M. Danelljan, Y.-W. Tai, C.-K. Tang, F. Yu, Video Mask Transfiner for High-Quality Video Instance Segmentation, ECCV 2022

Panel嘉宾：沈春华 (浙江大学)

嘉宾简介：

沈春华任职浙江大学计算机学院、计算机辅助设计与图形学国家重点实验室，是浙江大学求是讲席教授。2011到2021年，他在澳大利亚阿德莱德大学计算机学院、澳大利亚机器学习研究院 (Australian Institute for Machine Learning)、以及澳大利亚机器视觉卓越中心 (Australian Research Council Centre of Excellence for Robotic Vision)从事教学和科研工作；在这之前他在National ICT Australia堪培拉实验室以及澳洲国立大学工作了近6年。在从事教学的过去15年间，他指导毕业了29名博士生、30余名访问博士生。他本科就读于南京大学强化部、南京大学电子系硕士、阿德莱德大学博士。他的研究兴趣主要在计算机视觉的几个基础任务，包括目标检测、语义分割、实例分割，单目深度估计以及3D场景重建等。

Panel嘉宾：王乃岩 (图森未来)

嘉宾简介：

王乃岩，图森未来合伙人、首席科学家、香港科技大学博士，在图森未来负责自动驾驶卡车技术研发。MXNet核心开发者，2014 Google PhD Fellow计划入选者 (中国仅4人入选)，将深度学习应用于目标追踪领域全球第一人。多次在国际数据挖掘 (KDD Cup/ Kaggle Challenge)和计算机视觉 (ImageNet LSVRC)比赛中名列前茅，在计算机视觉与机器学习顶级会议与期刊上发表论文50余篇，发表论文引用次数已超过12,000余次。

Panel嘉宾：许春景 (华为技术有限公司)

嘉宾简介：

许春景，1999年于武汉大学获数学学士学位，2002年北京大学数学硕士学位，2009年香港中文大学哲学博士。2009年至2012年就职于中国科学院深圳先进技术研究院，历任助理研究院和副研究员。2012年加入华为，历任中央研究院媒体实验室高级工程师、主任工程师、以及诺亚方舟实验室计算视觉实验室主任。主要的研究兴趣在机器学习和计算机视觉，在机器学习和计算机视觉领域顶级学术会议和杂志如TPAMI/ CVPR/ ICCV/ IJCAI/ NeurIPS/ AAAI/ ICML上发表学术论文40余篇。

主持人：马超 (上海交通大学)

嘉宾简介：

马超，上海交通大学人工智能研究院长聘轨副教授，博士生导师。上海交通大学与加州大学默塞德分校联合培养博士。2016至2018年澳大利亚机器人视觉研究中心 (阿德莱德大学)博士后研究员。中国图象图形学会优博，上海市浦江人才，微软亚洲研究院“铸星计划”青年学者。主要研究计算机视觉与机器学习。研究工作多次发表在计算机视觉领域顶级期刊(TPAMI/ IJCV)和会议 (ICCV/ CVPR/ ECCV/ NIPS)上。谷歌学术总引用7600余次，连续两年入选爱思唯尔中国高被引学者 (2020-2021)。研究成果应用于华为达芬奇芯片及其无人驾驶MDC平台，获华为技术合作领域2021年度优秀技术奖。

个人主页：

https://vision.sjtu.edu.cn/

特别鸣谢本次Webinar主要组织者：

主办AC：马超 (上海交通大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ R群，群号：137634472）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。