20180606-16 马林：Image/video Captioning

2018-5-31 18:02| 发布者: 程一-计算所| 查看: 4202| 评论: 0

摘要: 报告嘉宾：马林（腾讯）报告时间：2018年06月06日（星期三）晚上20:00（北京时间）报告题目：Image/video Captioning主持人：姬艳丽（电子科大）报告人简介：Lin Ma is now a Principal Researcher with Tencent AI ...

报告嘉宾：马林（腾讯）

报告时间：2018年06月06日（星期三）晚上20:00（北京时间）

报告题目：Image/video Captioning

主持人：姬艳丽（电子科大）

报告人简介：

Lin Ma is now a Principal Researcher with Tencent AI Lab, Shenzhen, China. Previously, he was a Researcher with Huawei Noah's Ark Lab, Hong Kong from Aug. 2013 to Sep. 2016. He received his Ph.D. degree in Department of Electronic Engineering at the Chinese University of Hong Kong (CUHK) in 2013. He received the B. E., and M. E. degrees from Harbin Institute of Technology, Harbin, China, in 2006 and 2008, respectively, both in computer science. His current research interests lie in the areas of deep learning, computer vision, especially the multimodal deep learning between vision and language.

个人主页：

http://www.ee.cuhk.edu.hk/~lma/

报告摘要：

Multimodal learning between vision and language, especially image/video captioning, has become a hot research topic. Associated with the language information, deeper understandings of the image/video can be achieved. I will give a brief introduction about our progresses on image/video captioning. For image captioning, we propose to learn to guide decoding for image captioning. For video captioning, we propose an encoder-decoder-reconstructor frame to make a comprehensive understanding of the bi-directional information, specifically the video-to-text and text-to-video, which can thereby boost the performance of video captioning. Besides video captioning, one novel task, namely dense video captioning, involves not only the video localization but also video captioning for each localized video segment. We build a new end-to-end neural network to fully couple the video localization and captioning together.

参考文献：

[1] Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present, X. Chen, L. Ma, W. Jiang, J. Yao, and W. Liu, CVPR 2018.

[2] Reconstruction Network for Video Captioning, B. Wang, L. Ma, W. Zhang, and W. Liu, CVPR 2018.

[3] Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning, J. Wang, W. Jiang, L. Ma, W. Liu, and Y. Xu, CVPR 2018.

18-16期VALSE在线学术报告参与方式：

长按或扫描下方二维码，关注“VALSE”微信公众号（valse_wechat），后台回复“16期”，获取直播地址。

特别鸣谢本次Webinar主要组织者：

VOOC责任委员：沈复民（电子科大）

VODB协调理事：林倞（中山大学）

活动参与方式：

1、VALSE Webinar活动依托在线直播平台进行，活动时讲者会上传PPT或共享屏幕，听众可以看到Slides，听到讲者的语音，并通过聊天功能与讲者交互；

2、为参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ群（目前A、B、C、D、E、F、G群已满，除讲者等嘉宾外，只能申请加入VALSE H群，群号：701662399）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、在活动开始前5分钟左右，讲者会开启直播，听众点击直播链接即可参加活动，支持安装Windows系统的电脑、MAC电脑、手机等设备；

4、活动过程中，请不要说无关话语，以免影响活动正常进行；

5、活动过程中，如出现听不到或看不到视频等问题，建议退出再重新进入，一般都能解决问题；

6、建议务必在速度较快的网络上参加活动，优先采用有线网络连接；

7、VALSE微信公众号会在每周一推送上一周Webinar报告的总结及视频（经讲者允许后），每周四发布下一周Webinar报告的通知及直播链接。

收藏邀请

上一篇：20180530-15 陶文兵：Efficient Large-Scale 3D Reconstruction下一篇：20180613-17 赵行：像素之声-图像和声音的跨模态自监督学习

20180606-16 马林：Image/video Captioning

最新评论

相关分类