为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自北京大学、罗切斯特大学、上海交通大学、新加坡国立大学、加州大学圣克鲁兹分校的文本到视频生成评估基准 (Benchmark for Text-to-Video Generation)的工作。该工作由袁粒教授指导,论文第一作者袁盛海硕士一年级录制。 论文题目: EMR-Merging: Tuning-Free High-Performance Model MerChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation ging 作者列表: 袁盛海 (北京大学),黄锦发 (罗切斯特大学),徐永祺 (北京大学),刘耀阳 (北京大学),张少峰 (上海交通大学), 施宇钧 (新加坡国立大学),朱芮捷 (加州大学圣克鲁兹分校),程鑫华 (北京大学),罗杰波 (罗切斯特大学),袁粒 (北京大学) B站观看网址: 论文摘要: We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic knowledge skills in time-lapse video generation of the T2V models (e.g. Sora and Lumiere). Compared to existing benchmarks that focus on visual quality and text relevance of generated videos, ChronoMagic-Bench focuses on the models’ ability to generate time-lapse videos with significant metamorphic amplitude and temporal coherence. The benchmark probes T2V models for their physics, biology, and chemistry capabilities, in a free-form text control. For these purposes, ChronoMagic-Bench introduces 1,649 prompts and real-world videos as references, categorized into four major types of time-lapse videos: biological, human creation, meteorological, and physical phenomena, which are further divided into 75 subcategories. This categorization ensures a comprehensive evaluation of the models’ capacity to handle diverse and complex transformations. To accurately align human preference on the benchmark, we introduce two new automatic metrics, MTScore and CHScore, to evaluate the videos' metamorphic attributes and temporal coherence. MTScore measures the metamorphic amplitude, reflecting the degree of change over time, while CHScore assesses the temporal coherence, ensuring the generated videos maintain logical progression and continuity. Based on the ChronoMagic-Bench, we conduct comprehensive manual evaluations of eighteen representative T2V models, revealing their strengths and weaknesses across different categories of prompts, providing a thorough evaluation framework that addresses current gaps in video generation research. More encouragingly, we create a large-scale ChronoMagic-Pro dataset, containing 460k high-quality pairs of 720p time-lapse videos and detailed captions. Each caption ensures high physical content and large metamorphic amplitude, which have a far-reaching impact on the video generation community. 参考文献: [1] Shenghai Yuan, Jinfa Huang, Yongqi Xu, YaoYang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan, “ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation” in Neural Information Processing Systems (NuerIPS D&B Spotlight 2024). 论文链接: https://arxiv.org/abs/2406.18522 代码链接: https://github.com/PKU-YuanGroup/ChronoMagic-Bench 视频讲者简介: 袁盛海,北京大学2024级硕士生,研究方向为视频生成、多模态理解。在NeurIPS、ACM MM等国际高水平会议或期刊发表多篇论文,代表工作有Open-Sora Plan、MagicTime,一系列工作累计GitHub star超16k。其中Open-Sora Plan登顶GitHub的trend第一,MagicTime Github Star 1.3K +。 个人主页: https://shyuanbest.github.io/ 特别鸣谢本次论文速览主要组织者: 月度轮值AC:杨帅 (南洋理工大学) |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2025-10-15 01:16 , Processed in 0.014601 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.