为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自电子科技大学的基于文本生成视频的剪枝 (Pruning on Text-to-Video Synthesis)工作。该工作由高联丽教授和宋井宽教授指导,论文一作苏思桐同学录制。 论文题目: F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis 作者列表: 苏思桐 (电子科技大学),刘健知 (电子科技大学),高联丽 (电子科技大学),宋井宽 (电子科技大学) B站观看网址: https://www.bilibili.com/video/BV11GxPerEDf/?spm_id_from=333.999.0.0&vd_source=60f8dec9131b788976f17e2922918e04 复制链接到浏览器打开或点击阅读原文即可跳转至观看页面。 论文摘要: Recently Text-to-Video (T2V) synthesis has undergone a breakthrough by training transformers or diffusion models on large-scale datasets. Nevertheless, inferring such large models incurs huge costs. Previous inference acceleration works either require costly retraining or are model-specific. To address this issue, instead of retraining we explore the inference process of two mainstream T2V models using transformers and diffusion models. The exploration reveals the redundancy in temporal attention modules of both models, which are commonly utilized to establish temporal relations among frames. Consequently, we propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights. Specifically, when aggregate temporal attention values are ranked below a certain ratio, corresponding weights will be pruned. Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning in inference acceleration, quality assurance and broad applicability. 参考文献: [1] Sitong Su*, Jianzhi Liu*, Lianli Gao, Jingkuan Song. F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis. The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), Vancouver, Canada, 2024. 论文链接: [https://arxiv.org/abs/2312.03459]
视频讲者简介: Sitong Su is working toward the Ph.D. degree in the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China. Her research interests manly focus on computer vision, cross-modal visual synthesis and edit, diffusion models. 特别鸣谢本次论文速览主要组织者: 月度轮值AC:王一帆 (大连理工大学) 活动参与方式 1、VALSE每周举行的Webinar活动依托B站直播平台进行,欢迎在B站搜索VALSE_Webinar关注我们! 直播地址: https://live.bilibili.com/22300737; 历史视频观看地址: https://space.bilibili.com/562085182/ 2、VALSE Webinar活动通常每周三晚上20:00进行,但偶尔会因为讲者时区问题略有调整,为方便您参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ T群,群号:863867505); *注:申请加入VALSE QQ群时需验证姓名、单位和身份,缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。 3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。 4、您也可以通过访问VALSE主页:http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT(经讲者允许后),会在VALSE官网每期报告通知的最下方更新。 |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2025-2-2 05:50 , Processed in 0.012548 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.