VALSE

VALSE 首页 活动通知 好文作者面授招 查看内容

20160601-17吴琦:Image Captioning and Visual Question Answering

2016-5-30 15:02| 发布者: 程一-计算所| 查看: 8611| 评论: 0

摘要: 报告嘉宾1:吴琦(澳大利亚阿德莱德大学)报告时间:2016年6月1日(星期三)晚20:00(北京时间)报告题目:Image Captioning and Visual Question Answering主持人: 彭玺(A*STAR)报告摘要:The fields of natural ...

报告嘉宾1:吴琦(澳大利亚阿德莱德大学)
报告时间:2016年6月1日(星期三)晚20:00(北京时间)
报告题目:Image Captioning and Visual Question Answering
主持人:  彭玺(A*STAR)

报告摘要:The fields of natural language processing (NLP) and computer vision (CV) have seen great advances in their respective goals of analysing and generating text, and of understanding images and videos. While both fields share a similar set of methods rooted in artificial intelligence and machine learning, they have historically developed separately. Recent years, however, have seen an upsurge of interest in problems that require combination of linguistic and visual information. For example, Image Captioning and Visual Question Answering (VQA) are two important research topics in this area. 
In this talk I will first outline some of the most recent progresses, present some theories and techniques for these two Vision-to-Language tasks, and then discuss our two recent works. In these works, we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions.  Our final model achieves the best reported results on both image captioning and visual question answering on several benchmark datasets.

参考文献:
[1]. Qi  Wu, Peng  Wang,  Chunhua  Shen, Anton  van  den  Hengel,  Anthony  Dick. Ask Me Anything:   Free-form   Visual   Question   Answering   Based   on Knowledge from External  Sources. In  Proceedings  of  IEEE  Conference  on Computer Vision and Pattern Recognition (CVPR 2016), 2016. 
[2]. Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick. What Value Do  Explicit  High  Level  Concepts  Have  in  Vision  to  Language Problems In Proceedings  of  IEEE  Conference  on  Computer  Vision  and Pattern Recognition (CVPR 2016), 2016.

报告人简介:Dr Qi Wu is currently a Senior Research Associate in the Australia Centre for Visual Technology (ACVT) in the University of Adelaide, Australia. He received an MSc in Global Computing and Media Technology, a PhD in Computer Science from the University of Bath (United Kingdom), in 2011 and 2015.  His research interests include cross-depictive style object modelling, object detection and Vision-to-Language problems. He is especially interested in the problem of Image Captioning and Visual Question Answering. His image captioning model produced the best result in the Microsoft COCO Image Captioning Challenges in the last year and his VQA model is the current state-of-the-art in the area. His work has been published in prestigious conferences such as CVPR, ICCV and ECCV.

最新评论

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2024-11-23 10:56 , Processed in 0.012546 second(s), 15 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

返回顶部