为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。 论文题目: Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation 作者列表: Yunnan Wang, Ziqiang Li, Zequn Zhang, Wenyao Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin B站观看网址: 论文摘要: There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To address this issue, we leverage the scene graph, a powerful structured representation, for complex image generation. Different from the previous works that directly use scene graphs for generation, we employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner, compositing diverse disentangled visual clues from scene graphs. Specifically, we first propose a Semantics-Layout Variational Autoencoder (SL-VAE) to jointly derive (layouts, semantics) from the input scene graph, which allows a more diverse and reasonable generation in a one-to-many mapping. We then develop a Compositional Masked Attention (CMA) integrated with a diffusion model, incorporating (layouts, semantics) with fine-grained attributes as generation guidance. To further achieve graph manipulation while keeping the visual content consistent, we introduce a Multi-Layered Sampler (MLS) for an “isolated” image editing effect. Extensive experiments demonstrate that our method outperforms recent competitors based on text, layout, or scene graph, in terms of generation rationality and controllability.
参考文献: [1] Yunnan Wang, Ziqiang Li, Zequn Zhang, Wenyao Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin, “Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation,” Advances in Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada, December 2024. 论文链接: https://arxiv.org/abs/2410.00447 视频讲者简介: Yunnan Wang is a Ph.D. Student of the joint program between Shanghai Jiao Tong University and Eastern Institute of Technology, Ningbo. He received the B.E. degrees in Detection, Guidance, and Control Technology from Northwestern Polytechnical University in 2020, and the M.S. degree in Control Science and Engineering from Shanghai Jiao Tong University in 2023. His research interests include computer vision and multimodal representation learning.
个人主页: https://wangyunnan.github.io/ 特别鸣谢本次论文速览主要组织者: 月度轮值AC:于茜 (北京航空航天大学) |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2025-10-14 13:31 , Processed in 0.013512 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.