VALSE 论文速览第91期：DF-GAN：一个简单有效的文本到图像生成对抗网络 ...

2022-8-16 18:18| 发布者: 程一-计算所| 查看: 1496| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自南京邮电大学的根据文本生成图像方面的工作。该工作由论文第一作者陶明博士录制。

论文题目：DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

作者列表：Ming Tao (Nanjing University of Posts and Telecommunications), Hao Tang (ETH Zurich), Fei Wu (Nanjing University of Posts and Telecommunications), Xiaoyuan Jing (Wuhan University), Bing-Kun Bao (Nanjing University of Posts and Telecommunications), Changsheng Xu (NLPR, Institute of Automation, CAS)

B站观看网址：

https://www.bilibili.com/video/BV1ad4y1D76D/

论文摘要：

Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks in adversarial learning for text-image semantic consistency, which limits the supervision capability of these networks. Third, the cross-modal attention-based text-image fusion that widely adopted by previous works is limited on several special image scales because of the computational cost. To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).

To be specific, we propose:

(i) a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators,

(ii) a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output, which enhances the text-image semantic consistency without introducing extra networks,

(iii) a novel deep text-image fusion block, which deepens the fusion process to make a full fusion between text and visual features.

Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images and achieves better performance on widely used datasets.

论文信息：

[1] Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bing-Kun Bao and Changsheng Xu. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. CVPR 2022 (Oral)

论文链接：

[https://arxiv.org/abs/2008.05865]

代码链接：

[https://github.com/tobran/DF-GAN]

视频讲者简介：

陶明，南京邮电大学博士生，主要研究方向是深度学习与多模态生成。

特别鸣谢本次论文速览主要组织者：

月度轮值AC：王智慧 (大连理工大学)、杨旭 (西安电子科技大学)

季度责任AC：魏秀参 (南京理工大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ R群，群号：137634472）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。

4、您也可以通过访问VALSE主页：http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新。

收藏邀请

上一篇：VALSE Webinar 22-21期总第288期面向交互行为的视觉场景理解下一篇：VALSE 论文速览第92期：基于最大熵原理的目标检测搜索方法MAE-Det ...

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-3 07:06 , Processed in 0.012890 second(s), 14 queries .

返回顶部

VALSE 论文速览 第91期：DF-GAN：一个简单有效的文本到图像生成对抗网络 ...

相关分类

下级分类

VALSE 论文速览第91期：DF-GAN：一个简单有效的文本到图像生成对抗网络 ...