VALSE › 首页 ›活动通知 › 查看内容

VALSE 论文速览第223期：基于场景图解耦与组合的可泛化复杂图像生成 ...

2025-9-11 21:00| 发布者: 程一-计算所| 查看: 1207| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。论文题目：Scene ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。

论文题目：

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation

作者列表：

Yunnan Wang, Ziqiang Li, Zequn Zhang, Wenyao Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin

B站观看网址：

https://www.bilibili.com/video/BV1q6o7YZEJ3/

论文摘要：

There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To address this issue, we leverage the scene graph, a powerful structured representation, for complex image generation. Different from the previous works that directly use scene graphs for generation, we employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner, compositing diverse disentangled visual clues from scene graphs. Specifically, we first propose a Semantics-Layout Variational Autoencoder (SL-VAE) to jointly derive (layouts, semantics) from the input scene graph, which allows a more diverse and reasonable generation in a one-to-many mapping. We then develop a Compositional Masked Attention (CMA) integrated with a diffusion model, incorporating (layouts, semantics) with fine-grained attributes as generation guidance. To further achieve graph manipulation while keeping the visual content consistent, we introduce a Multi-Layered Sampler (MLS) for an “isolated” image editing effect. Extensive experiments demonstrate that our method outperforms recent competitors based on text, layout, or scene graph, in terms of generation rationality and controllability.

参考文献：

[1] Yunnan Wang, Ziqiang Li, Zequn Zhang, Wenyao Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin, “Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation,” Advances in Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada, December 2024.

论文链接：

https://arxiv.org/abs/2410.00447

视频讲者简介：

Yunnan Wang is a Ph.D. Student of the joint program between Shanghai Jiao Tong University and Eastern Institute of Technology, Ningbo. He received the B.E. degrees in Detection, Guidance, and Control Technology from Northwestern Polytechnical University in 2020, and the M.S. degree in Control Science and Engineering from Shanghai Jiao Tong University in 2023. His research interests include computer vision and multimodal representation learning.

个人主页：

https://wangyunnan.github.io/

特别鸣谢本次论文速览主要组织者：

月度轮值AC：于茜 (北京航空航天大学)

收藏邀请

上一篇：VALSE 论文速览第222期：In-Context Matting下一篇：VALSE 论文速览第224期：弥合低秩适配和正交微调差异的Householder反射适配方法 ... ...

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2026-3-4 07:00 , Processed in 0.014372 second(s), 14 queries .

返回顶部

VALSE 论文速览 第223期：基于场景图解耦与组合的可泛化复杂图像生成 ...

相关分类

下级分类

VALSE 论文速览第223期：基于场景图解耦与组合的可泛化复杂图像生成 ...