VALSE 首页 活动通知 好文作者面授招 查看内容

20180509-12 张含望:Towards X Visual Reasoning

2018-5-3 21:51| 发布者: 程一-计算所| 查看: 3190| 评论: 0

摘要: 报告嘉宾:张含望(Nanyang Technological University)报告时间:2018年05月09日(星期三)晚上20:00(北京时间)报告题目:Towards X Visual Reasoning主持人:杨阳(电子科技大学)报告人简介:Dr. Hanwang Zhang ...

报告嘉宾:张含望Nanyang Technological University


报告题目:Towards X Visual Reasoning



Dr. Hanwang Zhang is Nanyang Assistant Professor at Nanyang Technological University, Singapore. He was a research scientist at the Department of Computer Science, Columbia University, USA and a senior research fellow at the School of Computing, National University of Singapore, Singapore. He has received the B.Eng (Hons.) degree in computer science from Zhejiang University, Hangzhou, China, in 2009, and the Ph.D. degree in computer science from the National University of Singapore in 2014. His research interest includes computer vision, multimedia, and social media. Dr. Zhang is the recipient of the Best Demo runner-up award in ACM MM 2012, the Best Student Paper award in ACM MM 2013, and the Best Paper Honorable Mention in ACM SIGIR 2016. He is also the winner of Best Ph.D. Thesis Award of School of Computing, National University of Singapore, 2014.



1.Visual Translation Embedding Network for Visual Relation Detection. Zhang et al. CVPR’17

2.PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN. Zhang et al. ICCV’17

3.Grounding Referring Expressions in Images by Variational Context. Zhang et al. CVPR’18

4.Neural Motifs: Scene Graph Parsing with Global Context. Zellars et al. CVPR’18

5.Self-critical Sequence Training for Image Captioning. Rennie et al. CVPR’17

6.Counterfactual Multi-Agent Policy Gradients. Foerster et al. AAAI’18


For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these low-level vision solutions, we are hunger for a higher-level representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making. In particular, we wish an "X" reasoning, where X means eXplainable and eXplicit. In this talk, we will first explore three existing topics about 1). visual relationship detection, a fundamental technique for visual knowledge extraction, 2). referring expression grounding, a comprehensive task for object localization, and 3). sequence-level image captioning, a reinforcement learning based image captioning framework with a context-aware policy network that can reason where to look. Then, we will look ahead some ongoing research about design-free module network for VQA and scene dynamics for scene graph generation.





1、VALSE Webinar活动依托在线直播平台进行,活动时讲者会上传PPT或共享屏幕,听众可以看到Slides,听到讲者的语音,并通过聊天功能与讲者交互;

2、为参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ群(目前A、B、C、D、E、F、G群已满,除讲者等嘉宾外,只能申请加入VALSE H群,群号:701662399),直播链接会在报告当天(每周三)在VALSE微信公众号和VALSE QQ群发布;

*注:申请加入VALSE QQ群时需验证姓名、单位和身份,缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。








小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2021-3-7 22:24 , Processed in 0.012135 second(s), 15 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.