为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自中国人民大学的细粒度不平衡多模态学习的工作。该工作由胡迪副教授和王子贺副教授指导,论文一作卫雅珂同学录制。 论文题目: Enhancing Multimodal Cooperation via Sample-level Modality Valuation 作者列表: 卫雅珂 (中国人民大学)、冯若轩 (中国人民大学)、王子贺 (中国人民大学)、胡迪 (中国人民大学) B站观看网址: https://www.bilibili.com/video/BV1FrKweLEEp/ 复制链接到浏览器打开或点击阅读原文即可跳转至观看页面。 论文摘要: One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However, most models often suffer from unsatisfactory multimodal cooperation, which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality, but they are often hard to provide the fine-grained observation of multimodal cooperation at sample-level with theoretical support. Hence, it is essential to reasonably observe and improve the fine-grained cooperation between modalities, especially when facing realistic scenarios where the modality discrepancy could vary across different samples. To this end, we introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample. Via modality valuation, we observe that modality discrepancy indeed could be different at sample-level, beyond the global contribution discrepancy at dataset-level. We further analyze this issue and improve cooperation between modalities at sample-level by enhancing the discriminative ability of low-contributing modalities in a targeted manner. Overall, our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement. 参考文献: [1] Y. Wei, R. Feng, Z. Wang, and D. Hu, “Enhancing multimodal cooperation via fine-grained modality valuation,” in CVPR, 2024. 论文链接: [https://arxiv.org/abs/2309.06255] 代码链接: [https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation] 视频讲者简介: 卫雅珂,中国人民大学高瓴人工智能学院博士生。主要研究方向为多模态学习机制。在TPAMI、CVPR、ICML等顶级期刊和会议发表多篇论文。 个人主页: https://echo0409.github.io/ 特别鸣谢本次论文速览主要组织者: 月度轮值AC:韩波 (香港浸会大学) |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2025-10-14 01:01 , Processed in 0.012902 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.