20200422-10 视觉-语言：推理还是预训练？

2020-4-16 15:11| 发布者: 程一-计算所| 查看: 4114| 评论: 0

摘要: 报告时间：2020年4月22日（星期三）下午13:00（北京时间）主题：视觉-语言：推理还是预训练？Vision-Language: Reasoning VS Pre-training?报告主持人：吴琦（澳大利亚阿德莱德大学）报告嘉宾：朱霖潮（University o ...

报告时间：2020年4月22日（星期三）下午13:00（北京时间）

主题：视觉-语言：推理还是预训练？Vision-Language: Reasoning VS Pre-training?

报告主持人：吴琦（澳大利亚阿德莱德大学）

报告嘉宾：朱霖潮（University of Technology Sydney）

报告题目：图模型和关系建模在视觉语言推理中的作用

报告嘉宾：卢家森（Allen Institute of AI）

报告题目：Visiolinguistic Pretraining and Multi-Task Vision and Language Representation Learning

Panel议题：

1. Vision-Language中视觉理解更重要还是语言理解更重要？

2. 真正的推理（reasoning）是什么？一个end-to-end的deep model可以算推理么？

3. Vision-Language未来的主流方法会是什么？基于推理的模型？还是基于预训练就够了？

4. 如何看待casual reasoning，counter fact以及bias的问题？

5. 在数据量有限的情况下，如何在vision-language中做数据增广？

6. 预训练需要大量的计算资源，该如何看待这种‘不公平’的研究？

7. Vision and language在实际生产中落地情况如何？

8. Vision and language在不同域间泛化有什么挑战？

Panel嘉宾：

朱霖潮（University of Technology Sydney）、卢家森（Allen Institute of AI）、张含望（新加坡南洋理工）、刘偲（北京航空航天大学）

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：朱霖潮（University of Technology Sydney）

报告时间：2020年4月22日（星期三）下午13:00（北京时间）

报告题目：图模型和关系建模在视觉语言推理中的作用

报告人简介：

朱霖潮，悉尼科技大学讲师。于2015年获得浙江大学学士学位，2015年和2016年于卡内基梅隆大学访学，于2019年获得悉尼科技大学博士学位。长期关注无监督，自监督语义特征学习，序列数据建模，模型迁移等。曾获得美国国家标准总局TRECVID LOC 2016比赛冠军，EPIC-Kitchens 2019，THUMOS 2015动作识别比赛冠军。

个人主页：

http://ffmpbgrnn.github.io/

报告摘要：

多模态学习是深度学习中一个重要的问题。从感知到语义理解仍存在较大鸿沟。在这个报告中，我们将介绍这几年深度学习技术在多模态学习中的发展，并引出推理和关系建模在多模态学习里的重要性。同时，我们也会讨论多模态学习未来有哪些有意义的实际应用。

参考文献：

[1] Zhu et al., ActBERT: Learning Global-Local Video-Text Representations, CVPR 2020.

[2] Li et al., Entangled Transformer for Image Captioning, ICCV 2019.

[3] Wu et al., Connective Cognition Network for Directional Visual Commonsense Reasoning, NeurIPS 2019.

[4] Wu et al., Decoupled Novel Object Captioner, ACM MM 2018.

[5] Zhu et al., Uncovering Temporal Context for Video Question Answering, IJCV 2017.

报告嘉宾：卢家森（Allen Institute of AI）

报告时间：2020年4月22日（星期三）下午13:30（北京时间）

报告题目：Visiolinguistic Pretraining and Multi-Task Vision and Language Representation Learning

报告人简介：

Jiasen Lu is a Research Scientist at Allen Institute of AI. He obtained his Ph.D. in the School of Interactive Computing at Georgia Tech, advised by Prof. Devi Parikh. His research is in computer vision, focusing particularly on the intersection between vision and language, including tasks such as visual question answering (VQA), image captioning and visual dialog. He has published at major computer vision (CVPR, ICCV, ECCV), machine learning (NeurIPS, ICLR) and robotics (CORL) conferences, and is a co-organizer of the first and second VQA workshop at CVPR.

个人主页：

https://www.cc.gatech.edu/~jlu347/

报告摘要：

In this talk, I will present our latest work on comprehending visually grounded language. First, we will discuss the challenging task of learning visual grounding of language. I will describe our recent work on how to pretrain task-agnostic visiolinguistic representations and multi-task learning in vision and language domains.

参考文献：

[1] Lu J, Batra D, Parikh D, Lee S. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. InAdvances in Neural Information Processing Systems 2019 (pp. 13-23).

[2] Lu, Jiasen, et al. "12-in-1: Multi-Task Vision and Language Representation Learning." CVPR 2020.

Panel嘉宾：张含望（新加坡南洋理工）

嘉宾简介：

张含望，博士，荣获“南洋”学者经费资助。他于2009年在浙江大学取得学士学位，并在2014年在新加坡国立大学取得了博士学位，之后在新加坡国立大学和美国哥伦比亚大学从事研究工作。张博士的主要研究领域是多模态当中的计算机视觉和机器推理。张博士曾经获得ACM MM 2013的最佳学生论文，ACM SIGIR 2016的最佳论文提名奖，以及TOMM 2018的最佳论文。其团队获得Visual Dialog Challenge 2018亚军以及2019冠军。

个人主页：

https://mreallab.github.io/

Panel嘉宾：刘偲（北京航空航天大学）

嘉宾简介：

刘偲，北航计算机学院副教授、博导。其研究方向是跨媒体智能分析。Google Scholar引用5000+次。刘博士曾经获得ACM MM 2013的最佳论文和ACM MM 2012 最佳技术演示奖。她曾获得CVPR 2017 Look Into Person Challenge冠军和ICCV 2019 Large-scale Video Object Segmentation Challenge冠军，并主办了ECCV 2018和ICCV 2019 ‘Person in Context’ workshop/challenge（http://picdataset.com/challenge/index/）。

个人主页：

http://colalab.org/

主持人：吴琦（澳大利亚阿德莱德大学）

主持人简介：

Dr Qi Wu is a Senior Lecturer (Assistant Professor) at the University of Adelaide and he is the ARC Discovery Early Career Researcher Award (DECRA) Fellow between 2019-2021. He was awarded a J G Russell Award by Australian Academy of Science. He joined the Australia Centre for Robotic Vision firstly as a Research Fellow before becoming an Associate Investigator in 2018. He obtained his PhD degree in 2015 and MSc degree in 2011, in Computer Science from University of Bath, United Kingdom. His research interests are mainly in computer vision and machine learning. Currently, he is working on the vision-language problem and he is especially an expert in the area of image captioning and visual question answering (VQA). His attributes-based image captioning model got first place on the MS COCO Image Captioning Challenge Leader Board in October of 2015. He has published several papers in prestigious conferences and journals, such as TPAMI, CVPR, ICCV, ECCV, IJCAI and AAAI.

个人主页：

http://qi-wu.me/

20-10期VALSE在线学术报告参与方式：

长按或扫描下方二维码，关注“VALSE”微信公众号（valse_wechat），后台回复“10期”，获取直播地址。

特别鸣谢本次Webinar主要组织者：

主办AC：吴琦（澳大利亚阿德莱德大学）

协办AC：朱霖潮（University of Technology Sydney）

VALSE Webinar改版说明：

自2019年1月起，VALSE Webinar改革活动形式，由过去每次一个讲者的方式改为两种可能的形式：

1）Webinar专题研讨：每次活动有一个研讨主题，先邀请两位主题相关的优秀讲者做专题报告（每人30分钟），随后邀请额外的2~3位嘉宾共同就研讨主题进行讨论（30分钟）。

2）Webinar特邀报告：每次活动邀请一位资深专家主讲，就其在自己熟悉领域的科研工作进行系统深入的介绍，报告时间50分钟，主持人与主讲人互动10分钟，自由问答10分钟。

活动参与方式：

1、VALSE Webinar活动依托在线直播平台进行，活动时讲者会上传PPT或共享屏幕，听众可以看到Slides，听到讲者的语音，并通过聊天功能与讲者交互；

2、为参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ群（目前A、B、C、D、E、F、G、H、I、J、K群已满，除讲者等嘉宾外，只能申请加入VALSE M群，群号：531846386）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、在活动开始前5分钟左右，讲者会开启直播，听众点击直播链接即可参加活动，支持安装Windows系统的电脑、MAC电脑、手机等设备；

4、活动过程中，请不要说无关话语，以免影响活动正常进行；

5、活动过程中，如出现听不到或看不到视频等问题，建议退出再重新进入，一般都能解决问题；

6、建议务必在速度较快的网络上参加活动，优先采用有线网络连接；

7、VALSE微信公众号会在每周四发布下一周Webinar报告的通知及直播链接。

8、Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新[slides]。

9、Webinar报告的视频（经讲者允许后），会更新在VALSE爱奇艺空间，请在爱奇艺关注Valse Webinar进行观看。

朱霖潮 [slides]

卢家森 [slides]

收藏邀请

上一篇：20200415-09 机器学习 vs 压缩感知：核磁共振成像与重建下一篇：20200429-11 深度目标检测

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2026-7-28 11:33 , Processed in 0.014505 second(s), 14 queries .

返回顶部

20200422-10 视觉-语言：推理还是预训练？

相关分类

下级分类