报告嘉宾1:吴琦(澳大利亚阿德莱德大学) 报告时间:2016年6月1日(星期三)晚20:00(北京时间) 报告题目:Image Captioning and Visual Question Answering 主持人: 彭玺(A*STAR) 报告摘要:The fields of natural language processing (NLP) and computer vision (CV) have seen great advances in their respective goals of analysing and generating text, and of understanding images and videos. While both fields share a similar set of methods rooted in artificial intelligence and machine learning, they have historically developed separately. Recent years, however, have seen an upsurge of interest in problems that require combination of linguistic and visual information. For example, Image Captioning and Visual Question Answering (VQA) are two important research topics in this area. In this talk I will first outline some of the most recent progresses, present some theories and techniques for these two Vision-to-Language tasks, and then discuss our two recent works. In these works, we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Our final model achieves the best reported results on both image captioning and visual question answering on several benchmark datasets. 参考文献: [1]. Qi Wu, Peng Wang, Chunhua Shen, Anton van den Hengel, Anthony Dick. Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016. [2]. Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick. What Value Do Explicit High Level Concepts Have in Vision to Language Problems In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016. 报告人简介:Dr Qi Wu is currently a Senior Research Associate in the Australia Centre for Visual Technology (ACVT) in the University of Adelaide, Australia. He received an MSc in Global Computing and Media Technology, a PhD in Computer Science from the University of Bath (United Kingdom), in 2011 and 2015. His research interests include cross-depictive style object modelling, object detection and Vision-to-Language problems. He is especially interested in the problem of Image Captioning and Visual Question Answering. His image captioning model produced the best result in the Microsoft COCO Image Captioning Challenges in the last year and his VQA model is the current state-of-the-art in the area. His work has been published in prestigious conferences such as CVPR, ICCV and ECCV. |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2024-11-23 10:56 , Processed in 0.012546 second(s), 15 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.