VALSE 论文速览第145期：ConZIC: Controllable Zero-shot Image Captioning

2023-11-1 14:41| 发布者: 程一-计算所| 查看: 1223| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自西安电子科技大学的零样本图像字幕生成 (Zero-shot image captioning)的工作。该工作由陈渤教授和张昊副教授指导，论文一作曾泽群同学录制。

论文题目：ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

作者列表：

曾泽群 (西安电子科技大学)，张昊 (西安电子科技大学)，王正珏 (西安电子科技大学)，鲁瑞颖 (西安电子科技大学)，王东升 (西安电子科技大学)，陈渤 (西安电子科技大学)

B站观看网址：

https://www.bilibili.com/video/BV19c411d7u9/

论文摘要：

Zero-shot capability has been considered as a new revolution of deep learning, letting machines work on tasks without curated training data. As a good start and the only existing outcome of zero-shot image captioning (IC), ZeroCap abandons supervised training and sequentially searches every word in the caption using the knowledge of large-scale pre-trained models. Though effective, its autoregressive generation and gradient-directed searching mechanism limit the diversity of captions and inference speed, respectively. Moreover, ZeroCap does not consider the controllability issue of zero-shot IC. To move forward, we propose a framework for Controllable Zero-shot IC, named ConZIC. The core of ConZIC is a novel sampling-based non-autoregressive language model named GibbsBERT, which can generate and continuously polish every word. Extensive quantitative and qualitative results demonstrate the superior performance of our proposed ConZIC for both zero-shot IC and controllable zero-shot IC. Especially, ConZIC achieves about 5× faster generation speed than ZeroCap, and about 1.5× higher diversity scores, with accurate generation given different control signals.

论文信息：

[1] Zequn Zeng, Hao Zhang, Zhengjue Wang, Ruiying Lu, Dongsheng Wang, Bo Chen, “ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing,” CVPR 2023.

论文链接：

[https://arxiv.org/abs/2303.02437]

代码链接：

[https://github.com/joeyz0z/ConZIC]

视频讲者简介：

曾泽群，西安电子科技大学博士生，陈渤教授团队。研究方向为多模态学习，曾在计算机视觉顶会顶刊CVPR，IJCV发表工作。

个人主页：

https://joeyz0z.github.io/

特别鸣谢本次论文速览主要组织者：

月度轮值AC：谢雨彤 (阿德莱德大学)

季度轮值AC：张磊 (重庆大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ S群，群号：317920537）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。

4、您也可以通过访问VALSE主页：http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新。

收藏邀请

上一篇：VALSE 论文速览第144期：AutoAD: Movie Description in Context下一篇：VALSE 论文速览第146期：SeqTrack: 基于序列生成的视觉目标跟踪算法 ...

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2026-2-15 09:57 , Processed in 0.013713 second(s), 14 queries .

返回顶部

VALSE 论文速览 第145期：ConZIC: Controllable Zero-shot Image Captioning

相关分类

下级分类

VALSE 论文速览第145期：ConZIC: Controllable Zero-shot Image Captioning