VALSE 论文速览第144期：AutoAD: Movie Description in Context

2023-10-29 14:36| 发布者: 程一-计算所| 查看: 626| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自牛津大学VGG组关于自动生成电影口述 (Audio Description)的工作。该工作由Andrew Zisserman教授指导，论文一作韩腾达录制。

论文题目：AutoAD: Movie Description in Context

作者列表：

Tengda Han (VGG, University of Oxford), Max Bain (VGG, University of Oxford), Arsha Nagrani (VGG, University of Oxford), Gül Varol (École des Ponts, Univ Gustave Eiffel, CNRS), Weidi Xie (Shanghai Jiao Tong University), Andrew Zisserman (VGG, University of Oxford).

B站观看网址：

https://www.bilibili.com/video/BV1DB4y1R78P/

论文摘要：

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the dependency of the descriptions on context, and the limited amount of training data available. In this work, we leverage the power of pretrained foundation models, such as GPT and CLIP, and only train a mapping network that bridges the two models for visually-conditioned text generation. In order to obtain high-quality AD, we make the following four contributions: (i) we incorporate context from the movie clip, AD from previous clips, as well as the subtitles of the current shot; (ii) we address the lack of training data by pretraining on large scale datasets, where visual or contextual information are unavailable, e.g. text-only AD without movies or visual captioning datasets without context; (iii) we improve on the currently available AD datasets, by removing label noise in the MAD dataset, and adding character naming information; and (iv) we obtain strong results on the movie AD task compared with previous methods.

论文信息：

[1] Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman. “AutoAD: Movie Description in Context”, in Proceeding of IEEE Computer Vision and Pattern Recognition (CVPR 2023), June 2023.

视频讲者简介：

Tengda Han is a post-doctoral research fellow at the Visual Geometry Group at the University of Oxford. He obtained his PhD from the same group in 2022 supervised by Andrew Zisserman. His current research focuses on self-supervised learning, efficient learning and video understanding.

个人主页：

https://tengdahan.github.io/

特别鸣谢本次论文速览主要组织者：

月度轮值AC：秦杰 (南京航空航天大学)

季度轮值AC：叶茫 (武汉大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ S群，群号：317920537）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。

4、您也可以通过访问VALSE主页：http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新。

收藏邀请

上一篇：VALSE 论文速览第143期：Multi-skill Mobile Manipulation下一篇：VALSE 论文速览第145期：ConZIC: Controllable Zero-shot Image Captioning

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-2 01:11 , Processed in 0.015105 second(s), 14 queries .

返回顶部

VALSE 论文速览 第144期：AutoAD: Movie Description in Context

相关分类

下级分类

VALSE 论文速览第144期：AutoAD: Movie Description in Context