报告嘉宾:张含望(Nanyang Technological University) 报告时间:2018年05月09日(星期三)晚上20:00(北京时间) 报告题目:Towards X Visual Reasoning 主持人:杨阳(电子科技大学) 报告人简介: Dr. Hanwang Zhang is Nanyang Assistant Professor at Nanyang Technological University, Singapore. He was a research scientist at the Department of Computer Science, Columbia University, USA and a senior research fellow at the School of Computing, National University of Singapore, Singapore. He has received the B.Eng (Hons.) degree in computer science from Zhejiang University, Hangzhou, China, in 2009, and the Ph.D. degree in computer science from the National University of Singapore in 2014. His research interest includes computer vision, multimedia, and social media. Dr. Zhang is the recipient of the Best Demo runner-up award in ACM MM 2012, the Best Student Paper award in ACM MM 2013, and the Best Paper Honorable Mention in ACM SIGIR 2016. He is also the winner of Best Ph.D. Thesis Award of School of Computing, National University of Singapore, 2014. 讲者个人主页: http://www.ntu.edu.sg/home/hanwangzhang/ 相关文献: 1.Visual Translation Embedding Network for Visual Relation Detection. Zhang et al. CVPR’17 2.PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN. Zhang et al. ICCV’17 3.Grounding Referring Expressions in Images by Variational Context. Zhang et al. CVPR’18 4.Neural Motifs: Scene Graph Parsing with Global Context. Zellars et al. CVPR’18 5.Self-critical Sequence Training for Image Captioning. Rennie et al. CVPR’17 6.Counterfactual Multi-Agent Policy Gradients. Foerster et al. AAAI’18 报告摘要: For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these low-level vision solutions, we are hunger for a higher-level representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making. In particular, we wish an "X" reasoning, where X means eXplainable and eXplicit. In this talk, we will first explore three existing topics about 1). visual relationship detection, a fundamental technique for visual knowledge extraction, 2). referring expression grounding, a comprehensive task for object localization, and 3). sequence-level image captioning, a reinforcement learning based image captioning framework with a context-aware policy network that can reason where to look. Then, we will look ahead some ongoing research about design-free module network for VQA and scene dynamics for scene graph generation. 特别鸣谢本次Webinar主要组织者: VOOC责任委员:张姗姗(南京理工大学) VODB协调理事:张兆翔(中科院自动化研究所) 活动参与方式: 1、VALSE Webinar活动依托在线直播平台进行,活动时讲者会上传PPT或共享屏幕,听众可以看到Slides,听到讲者的语音,并通过聊天功能与讲者交互; 2、为参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ群(目前A、B、C、D、E、F、G群已满,除讲者等嘉宾外,只能申请加入VALSE H群,群号:701662399),直播链接会在报告当天(每周三)在VALSE微信公众号和VALSE QQ群发布; *注:申请加入VALSE QQ群时需验证姓名、单位和身份,缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。 3、在活动开始前10分钟左右,讲者会开启直播,听众点击直播链接即可参加活动,支持安装Windows系统的电脑、MAC电脑、手机等设备; 4、活动过程中,请勿送花、打赏等,也不要说无关话语,以免影响活动正常进行; 5、活动过程中,如出现听不到或看不到视频等问题,建议退出再重新进入,一般都能解决问题; 6、建议务必在速度较快的网络上参加活动,优先采用有线网络连接; 7、VALSE微信公众号会在每周一推送上一周Webinar报告的总结及视频(经讲者允许后),每周四发布下一周Webinar报告的通知。 |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2024-11-24 08:13 , Processed in 0.012415 second(s), 15 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.