20201202-29 图文并茂, 让视觉与语言相得益彰

2020-11-26 18:26| 发布者: 程一-计算所| 查看: 4002| 评论: 0

摘要: 报告时间2020年12月2日 (星期三)晚上20:00 (北京时间)主题图文并茂, 让视觉与语言相得益彰主持人杨猛 (中山大学)报告嘉宾：王鹏 (西北工业大学)报告题目：Richer and Deeper报告嘉宾：汤凯华 (Nanyang Technologica ...

报告时间	2020年12月2日 (星期三) 晚上20:00 (北京时间)
主题	图文并茂, 让视觉与语言相得益彰
主持人	杨猛 (中山大学)

报告嘉宾：王鹏 (西北工业大学)

报告题目：Richer and Deeper

报告嘉宾：汤凯华 (Nanyang Technological University)

报告题目：Unbiased Scene Graph Generation

Panel嘉宾：

刘偲 (北京航空航天大学)、王鹏 (西北工业大学)、李冠彬 (中山大学)、白亚龙 (京东)、汤凯华 (Nanyang Technological University)

Panel议题：

1. 视觉与自然语言的结合是当前学界的研究热点之一，该方向有哪些子课题值得关注？这些子课题各自有哪些主流研究范式？

2. 是否有人类视觉语言联合处理机制的研究工作？目前的视觉语言学习的机器方法如何受到了人的视觉语言联合学习机制的启发，它对揭示人的大脑处理机制起到了什么推动作用？

3. 统一的视觉语言的预训练模型是否会成为视觉语言理解领域的标准方法？解决视觉和语言语义鸿沟问题的其他方法还有哪些有潜力与之竞争？

4. VQA问题与一般的图像理解和阅读理解问答有什么不同？VQA方法的设计是否是图像理解与阅读理解方法的叠加，需要哪些特别关注的地方？

5. 关于Scene graph的问题，是否有感觉数据集本身的问题限制了图像深度理解方向的发展？请问我们是否有必要构建一套统一的数据标准，专门针对图像中物体关键的理解任务，构建一套全新的数据集？

6. 关于图像和语言的深度理解在工业界的杀手级应用有哪些？相关技术距离大规模落地还有多远？

7. 图以及图神经网络方法在视觉语言理解问题中有哪些优势？未来基于图的视觉和语言理解有哪些可供研究的点？

8. 如何实现鲁棒可解释的视觉语言交互？视觉和语言的研究是否有机会成为认知智能发展的突破口？

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：王鹏 (西北工业大学)

报告时间：2020年12月2日(星期三)晚上20:00(北京时间)

报告题目：Richer and Deeper

报告人简介：

王鹏，西北工业大学计算机学院教授，分别于2004和2011年在北京航空航天大学获得学士和博士学位；2012年至2016年在澳大利亚阿德莱德大学从事科研工作；2017年加入西北工业大学计算机学院担任教授、博导。本人致力于计算机视觉和机器学习领域的研究。在TPAMI、IJCV、TIP、CVPR、ICCV、AAAI、IJCAI、ACM MM等CCF A类期刊会议发表论文20余篇。

个人主页：

https://teacher.nwpu.edu.cn/m/2017010053.html

报告摘要：

In this talk, I will introduce two recent works on vision and language understanding. The first one is a question-conditioned graph attention network for TextVQA, which is capable of reasoning over a heterogenous graph with text and object nodes. The second one is a dataset and pipeline that performs referring expression understanding using external commonsense knowledge. By incorporating more visual and non-visual information, we see an increasingly comprehensive visual reasoning ability.

参考文献：

[1] Wang, Peng，Liu, Dongyang，Li, Hui，Wu, Qi, Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge, ACM MM 2020.

[2] Gao, Chenyu，Zhu, Qi，Wang, Peng，Li, Hui，Liu, Yuliang，Hengel, Anton van den，Wu, Qi, Structured Multimodal Attentions for TextVQA, arXiv:2006.00753.

报告嘉宾：汤凯华 (Nanyang Technological University)

报告时间：2020年12月2日(星期三)晚上20:30(北京时间)

报告题目：Unbiased Scene Graph Generation

报告人简介：

Kaihua Tang is a third-year Ph.D. student at Nanyang Technological University, supervised by Hanwang Zhang. Before that, he received his Bachelor degree from Shanghai Jiao Tong University IEEE Honor Class, and the Dual-Master degree from the joint-programme of SJTU and Waseda University. His research interests include scene understanding, long-tailed recognition, visual reasoning, and causal inference in computer vision. Recently, he published two CVPR oral papers as the first author on unbiased scene graph generation, one of them was also appearing on the CVPR’19 Best Paper Finalists (45/5160).

个人主页：

https://kaihuatang.github.io/

报告摘要：

Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e.g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach". Given such SGG, the down-stream tasks such as VQA can hardly infer better scene structures than merely a bag of objects. However, debiasing in SGG is not trivial because traditional debiasing methods cannot distinguish between the good and bad bias, e.g., good context prior (e.g., "person read book" rather than "eat") and bad long-tailed bias (e.g., "near" dominating "behind / in front of"). In this talk, we will introduce an unbiased network structure, dubbed VCTree, together with an even bolder causal inference method, called TDE, to tackle this problem.

参考文献：

[1] Tang K, Zhang H, Wu B, Luo W, Liu W. Learning to compose dynamic tree structures for visual contexts. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019.

[2] Tang K, Niu Y, Huang J, Shi J, Zhang H. Unbiased scene graph generation from biased training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020.

Panel嘉宾：刘偲 (北京航空航天大学)

嘉宾简介：

Si Liu is currently an associate professor in Beihang University. She received her Ph.D. degree from Institute of Automation, Chinese Academy of Sciences. She has been Research Assistant and Postdoc in National University of Singapore. Her research interest includes computer vision and multimedia analysis. She has published over 40 cutting edge papers on the human-related analysis including the human parsing, face editing and image retrieval. She was the recipient of Best Paper of ACM MM 2013, Best demo award of ACM MM 2012. She was the Champion of CVPR 2017 Look Into Person Challenge and the organizer of the ECCV 2018 and ICCV 2019 Person in Context Challenge.

个人主页：

http://colalab.org/

Panel嘉宾：李冠彬 (中山大学)

嘉宾简介：

李冠彬，中山大学数据科学与计算机学院副教授，2016年获得香港大学博士学位。主要研究领域包括计算机视觉与机器学习，迄今为止累计发表论文90余篇，其中包含CCF A类/中科院一区论文52篇，包括IEEE TPAMI，TIP，TNNLS，TCYB等顶级期刊和CVPR，ICCV，ECCV，ICML，AAAI，IJCAI等顶级学术会议，Google Scholar引用超过2900次。曾获得ICCV2019最佳论文提名奖、中国图象图形学会科学技术一等奖（第三完成人）、ACM中国新星提名奖等荣誉。主持了包括广东省杰出青年基金、国家自然科学基金面上项目、国家自然科学基金青年项目、CCF-腾讯犀牛鸟科研基金等10多项科研项目。担任The Visual Computer编委，TPAMI、TIP、TNNLS、TMM、TCYB、TOG等权威期刊的审稿人，CVPR、ICCV、NeuralPS、AAAI等国际会议程序委员会委员。

个人主页：

http://guanbinli.com

Panel嘉宾：白亚龙 (京东)

嘉宾简介：

Yalong Bai is a Researcher at the JD Cloud&AI. He received his Ph.D. degree in Harbin Institute of Technology and Microsoft Research Asia Joint Ph.D. Education Program in 2018. His research interests include representation learning, image recognition, multimodal retrieval and vision&language. He has published 17 academic papers including CVPR/ICCV/ECCV/ICLR. He has won the first place in several international challenges such as AliProducts Recognition at CVPR2020, FGVC at CVPR2019, MSR Image Recognition at ICME2016, MSR-Bing Image Retrieval at MM2014.

个人主页：

http://ylbai.me

主持人：杨猛 (中山大学)

主持人简介：

杨猛，中山大学副教授、博士生导师。主要研究领域包括计算机视觉、自然语言处理与机器学习。迄今为止累计发表论文90余篇，其中包括IJCV, TIP, TNNLS, TIFS，PR等期刊和CVPR, ICCV, ICML, ECCV, AAAI, IJCAI, EMNLP Findings等会议论文。Google Scholar引用次数8000以上。相关成果已获得2017年教育部高校科研优秀成果奖自然科学奖一等奖1项（排名第二）。

个人主页：

www.smartllv.com

20-29期VALSE在线学术报告参与方式：

长按或扫描下方二维码，关注“VALSE”微信公众号 (valse_wechat)，后台回复“29期”，获取直播地址。