20191218-31 基于视觉和常识的深度推理

2019-12-12 17:42| 发布者: 程一-计算所| 查看: 5326| 评论: 0

摘要: 报告时间：2019年12月18日（星期三）晚上20:00（北京时间）主题：基于视觉和常识的深度推理报告主持人：刘偲（北京航空航天大学）报告嘉宾：王鹏（University of Wollongong）报告题目：Two New Datasets and Tasks ...

报告时间：2019年12月18日（星期三）晚上20:00（北京时间）

主题：基于视觉和常识的深度推理

报告主持人：刘偲（北京航空航天大学）

报告嘉宾：王鹏（University of Wollongong）

报告题目： Two New Datasets and Tasks on Abstract Visual Reasoning and Compositional Referring Expression Comprehension

报告嘉宾：李冠彬（中山大学）

报告题目：Language-Driven Visual Reasoning for Referring Expression Comprehension

Panel议题：

1. 现有的两种保存知识的方法知识图谱和预训练模型，各自存在什么样的问题？未来我们有没有可能构建一个包罗万象的超大规模的知识库？

2. 如何解决神经网络的可解释性问题？

3. 如何解决符号主义方法的推理复杂度问题？有没有办法获得直觉推理的能力？

4. 符号方法和联结方法的结合一直是一个研究热点，我们有没有更好的结合的思路？

5. 现有推理任务的评估手段是否合理？我们如何更客观准确的度量机器的推理能力？

Panel嘉宾：

王鹏 (University of Wollongong)、李冠彬 (中山大学)、张含望 (Nanyang Technological University)、王鹏 (西北工业大学)

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：王鹏（University of Wollongong）

报告时间：2019年12月18日（星期三）晚上20:00（北京时间）

报告题目：Two New Datasets and Tasks on Abstract Visual Reasoning and Compositional Referring Expression Comprehension

报告人简介：

Peng Wang is now a lecturer (assistant professor) with School of Computing and Information Technology, University of Wollongong, Australia. Before joining UOW, he was a research fellow with Australia Institute of Machine Learning (AIML), University of Adelaide. He obtained his PhD degree from University of Queensland in 2017. His research interests lie in computer vision and deep learning, with focus on low-shot classification, long-tail classification and visual reasoning. His research works have been published on main computer vision journals and conferences, such as TPAMI, IJCV, TIP, CVPR, AAAI, etc.

个人主页：

https://wp8619.github.io/

报告摘要：

One of the primary challenges faced by deep learning is the degree to which current methods exploit superﬁcial statistics and dataset bias, rather than learning to generalise over the speciﬁc representations they have experienced. This is a critical concern because generalisation enables robust reasoning over unseen data. In this talk, I will report our recent effort on visual reasoning. In the first work, we introduce a large-scale benchmark of visual questions that involve operations fundamental to many high-level vision tasks, such as comparisons of counts and logical operations on complex visual properties. The benchmark directly measures a method’s ability to infer high-level relationships and to generalise them over image-based concepts. It includes multiple training/test splits that require controlled levels of generalization. In the second work, we study visual reasoning in the context of referring expression comprehension, which requires joint reasoning over the textual and visual domains. To this end, we propose a new dataset with two main features. First, we design a novel expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent. We evaluate a range of deep learning architectures on both the datasets and the results demonstrate there still leaves substantial room for improvement.

参考文献：

【1】 Damien Teney*, Peng Wang*, Jiewei Cao, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel, “V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices”, in AAAI 2020.

【2】 Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, and Anton van den Hengel, “Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks”, in CVPR 2019.

【3】 Drew A. Hudson, and Christopher D. Manning, “GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering”, in CVPR 2019.

报告嘉宾：李冠彬（中山大学）

报告时间：2019年12月18日（星期三）晚上20:30（北京时间）

报告题目：Language-Driven Visual Reasoning for Referring Expression Comprehension

报告人简介：

李冠彬，博士，中山大学数据科学与计算机学院副教授，2016年于香港大学获得博士学位，主要研究领域包括视觉显著性检测、视觉推理及迁移学习等。已在IEEE TPAMI,IEEE TIP, CVPR, ICCV, ICML, AAAI等计算机视觉与人工智能领域顶级期刊及会议发表论文50余篇。曾获得ICCV2019最佳论文提名奖、中国图象图形学会科学技术一等奖（第三完成人）、CCF-腾讯犀牛鸟基金优秀奖。长期担任TPAMI、TIP、TMM、TOG等权威期刊的审稿人以及CVPR、ICCV、AAAI等国际会议程序委员会委员。

个人主页：

http://guanbinli.com

报告摘要：

Grounding referring expressions is a fundamental yet challenging task facilitating human-machine communication in the physical world. It aims to locate the object instance described by a natural language referring expression in an image. This task is compositional and inherently requires visual reasoning on top of the relationships among the objects in the image. In this talk, I will briefly introduce the research progress of this topic and then mainly focus on two of our recent works (CVPR2019 and ICCV2019) from the perspective of multi-order relationship embedded feature representation and language-driven visual reasoning.

参考文献：

【1】 Sibei Yang, Guanbin Li, Yizhou Yu, "Dynamic Graph Attention for Referring Expression Comprehension" International Conference on Computer Vision (ICCV), 2019.

【2】 Sibei Yang, Guanbin Li, Yizhou Yu, "Cross-Modal Relationship Inference for Grounding Referring Expressions" IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2019.

【3】 Qingxing Cao, Xiaodan Liang, Bailin Li, Guanbin Li, Liang Lin, "Visual Question Reasoning on General Dependency Tree" IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2018.

Panel嘉宾：张含望（新加坡南洋理工大学）

嘉宾简介：

张含望博士，荣获“南洋”学者经费资助。他于2009年在浙江大学取得学士学位，并在2014年在新加坡国立大学取得了博士学位，之后在新加坡国立大学和美国哥伦比亚大学从事研究工作。张博士的主要研究领域是多模态当中的计算机视觉和机器推理。张博士曾经获得ACM MM 2013的最佳学生论文，ACM SIGIR 2016的最佳论文提名奖，以及TOMM 2018的最佳论文。其团队获得Visual Dialog Challenge 2018亚军以及2019冠军。

个人主页：

https://mreallab.github.io/

Panel嘉宾：王鹏（西北工业大学）

嘉宾简介：

王鹏，男，2000~2011年在北京航空航天大学自动化科学与电气工程学院学习并获得学士和博士学位，博士毕业后先后在北京富士通研发中心（约1年）和阿德莱德大学计算机学院（约4年）从事科研工作，2017年入选中组部“青年千人”计划并于同年加入西北工业大学计算机学院任教授博导。主要从事计算机视觉、机器学习与人工智能等领域的研究工作。在TPAMI、IJCV、CVPR、ICCV、ECCV、AAAI等计算机数据和模式识别领域期刊和会议上发表一系列学术论文。

个人主页：

https://wangpengnorman.github.io/

主持人：刘偲（北京航空航天大学）

主持人简介：

北航计算机学院副教授、博导。其研究方向是跨模态多媒体智能分析。其研究成果发表于TPAMI、IJCV、CVPR、ICCV和ACM MM等。Google Scholar引用4000+。2017年入选中国科协青年人才托举工程。获2018吴文俊人工智能优秀青年奖。获CVPR 2017 LIP Challenge Human Parsing Track冠军以及 ICCV 2019 Youtube-Video Object Segmentation 2019冠军。主办了ECCV 2018和ICCV 2019 ‘Person in Context’ workshop。任ICCV 2019、CVPR 2020、ECCV 2020 Area chair以及AAAI 2019、IJCAI2019、IJCAI 2020 SPC。

个人主页：

http://colalab.org/

19-31期VALSE在线学术报告参与方式：

长按或扫描下方二维码，关注“VALSE”微信公众号（valse_wechat），后台回复“31期”，获取直播地址。

特别鸣谢本次Webinar主要组织者：

主办AC：刘偲（北京航空航天大学）

协办AC：吴琦（University of Adelaide）、苏航（清华大学）

责任AC：董乐（电子科技大学）

VALSE Webinar改版说明：

自2019年1月起，VALSE Webinar改革活动形式，由过去每次一个讲者的方式改为两种可能的形式：

1）Webinar专题研讨：每次活动有一个研讨主题，先邀请两位主题相关的优秀讲者做专题报告（每人30分钟），随后邀请额外的2~3位嘉宾共同就研讨主题进行讨论（30分钟）。

2）Webinar特邀报告：每次活动邀请一位资深专家主讲，就其在自己熟悉领域的科研工作进行系统深入的介绍，报告时间50分钟，主持人与主讲人互动10分钟，自由问答10分钟。

活动参与方式：

1、VALSE Webinar活动依托在线直播平台进行，活动时讲者会上传PPT或共享屏幕，听众可以看到Slides，听到讲者的语音，并通过聊天功能与讲者交互；

2、为参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ群（目前A、B、C、D、E、F、G、H、I、J、K群已满，除讲者等嘉宾外，只能申请加入VALSE L群，群号：641069169）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、在活动开始前5分钟左右，讲者会开启直播，听众点击直播链接即可参加活动，支持安装Windows系统的电脑、MAC电脑、手机等设备；

4、活动过程中，请不要说无关话语，以免影响活动正常进行；

5、活动过程中，如出现听不到或看不到视频等问题，建议退出再重新进入，一般都能解决问题；

6、建议务必在速度较快的网络上参加活动，优先采用有线网络连接；

7、VALSE微信公众号会在每周四发布下一周Webinar报告的通知及直播链接。

8、Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新[slides]。

9、Webinar报告的视频（经讲者允许后），会更新在VALSE爱奇艺空间，请在爱奇艺关注Valse Webinar进行观看。

王鹏[slides]

李冠彬[slides]

收藏邀请

上一篇：20191211-30 「见微知著」——细粒度视觉识别下一篇：VALSE 2020公开征集Workshop候选讲者

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-3 05:19 , Processed in 0.013272 second(s), 14 queries .

返回顶部

20191218-31 基于视觉和常识的深度推理

相关分类

下级分类