VALSE Webinar 25-03期总第374期多模态学习是否实现了1+1>2？

2025-2-15 19:25| 发布者: 程一-计算所| 查看: 347| 评论: 0

摘要: 报告嘉宾：吴楠 (Harvey AI Research Scientist)报告题目：Taming the Greedy Learner: Overcoming Imbalanced Learning in Deep Multimodal Networks报告嘉宾：杨杨 (南京理工大学)报告题目：面向模态失衡数据的鲁 ...

报告嘉宾：吴楠 (Harvey AI Research Scientist)

报告题目：Taming the Greedy Learner: Overcoming Imbalanced Learning in Deep Multimodal Networks

报告嘉宾：杨杨 (南京理工大学)

报告题目：面向模态失衡数据的鲁棒表示学习研究与应用

报告嘉宾：王思为 (智能博弈与决策实验室)

报告题目：基于复杂性视角的多视图学习：协同与自适应选择

Panel嘉宾：

吴楠 (Harvey AI Research Scientist)、杨杨 (南京理工大学)、王思为 (智能博弈与决策实验室)、刘新旺 (国防科技大学)，傅建龙 (微软亚洲研究院)

Panel议题：

1. 多模态学习系统是否应该是一种层次化学习系统（如感知-语义-交互的进阶形式）还是大一统的单一学习系统？

2. 平衡多模态学习日益受到关注，相对平衡如何寻求？是否应该依据任务“刻意制造不平衡”？

3. 如果多模态系统存在“涌现性”，那会涌现什么呢？如何实现这种能力涌现？

4. 不完备多模态数据（如模态缺失、关联噪声）对多模态基础模型和传统多模态深度模型训练有何影响？该如何应对？

报告嘉宾：吴楠 (Harvey AI Research Scientist)

报告时间：2025年2月19日 (星期三)晚上20:00 (北京时间)

报告题目：Taming the Greedy Learner: Overcoming Imbalanced Learning in Deep Multimodal Networks

报告人简介：

Dr. Nan Wu is a Research Scientist at Harvey AI, where she leads research on generative AI for legal applications and collaborates with OpenAI, Anthropic, and Mistral AI. Previously, she was a Senior Scientist at Merck & Co., applying AI to pharmaceutical research and digital pathology.

Nan earned her Ph.D. in Data Science from NYU, where she worked on advancing multimodal deep learning with applications to breast cancer screening. During her Ph.D., she conducted research at Google Research and AWS AI Lab, working on foundation models for speech-text and image-text integration. Before NYU, she studied Statistics and Business Administration at the University of Science and Technology of China, School of Gifted Young.

She has published in top AI and medical venues (ICML, NeurIPS, IEEE TMI, Medical Image Analysis)and received honors including the Google Ph.D. Fellowship (2020-2023), recognition as one of the Top 80 Global Young Chinese Women AI Scholars (Baidu, 2023), and a Best Paper Award at ICML.

个人主页：

https://wooginawunan.github.io/

报告摘要：

Despite the promise of using multiple mammographic views in deep learning–based breast cancer screening, training multiview networks to fully leverage complementary information remains challenging. Through extensive experiments, we show that naïvely combining multiple views often leads to imbalanced learning, with models fixating on a single view. To address this, we evaluate a wide range of algorithms and regularizers, identifying methods that balance learning across both views, thereby capturing richer representations and substantially improving performance.

We further generalize these findings to multimodal datasets and tasks, introducing the Greedy Learner Hypothesis to explain the prevalent phenomenon of imbalanced learning between modalities. Building on this hypothesis, we propose strategies to encourage more balanced utilization of modality-specific information, resulting in enhanced generalization and classification performance in diverse real-world settings.

参考文献：

[1] Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening. Nan Wu, et.al. IEEE Transactions on Medical Imaging 2019.

[2] Improving the Ability of Deep Neural Networks to Use Information from Multiple Views in Breast Cancer Screening. Nan Wu, Stanisław Jastrzębski, Jungkyu Park, Linda Moy, Kyunghyun Cho, and Krzysztof J Geras. MIDL 2020.

[3] Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks. Nan Wu, Stanisław Jastrzębski, Kyunghyun Cho, and Krzysztof J Geras. ICML 2022.

报告嘉宾：杨杨 (南京理工大学)

报告时间：2025年2月19日 (星期三)晚上20:30 (北京时间)

报告题目：面向模态失衡数据的鲁棒表示学习研究与应用

报告人简介：

杨杨，南理工计算机科学与工程学院教授、博导。主要研究方向是开放环境数据挖掘。入选国家级青年人才计划、科协青年人才托举工程，主持科技部重点研发青年科学家项目、江苏省杰出青年基金项目等，以第一作者身份发表IEEE TPAMI、TKDE、NeurIPS、KDD等CCF A类期刊和会议论文22篇，获国际会议ACML17最佳论文奖。基于研究成果，在CVPR、ICCV等国内外相关竞赛上获20项冠军。

报告摘要：

报告围绕开放环境下模态失衡数据的鲁棒表示学习研究与应用，主要针对开放环境多模态数据凸显的模态缺失、模态噪声以及模态竞争等挑战，提出了相关多模态可靠表征学习方法，提升了下游分类、检索的性能，并在相关实践中应用。

参考文献：

[1] Yang Yang, Wenjuan Xi, Luping Zhou, and Jinhui Tang. Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation. IEEE Transactions on Image Processing (IEEE TIP), 2024.

[2] Yang Yang, Fengqiang Wan, Qing-Yuan Jiang, Yi Xu. Facilitating Multimodal Classification via Dynamically Learning Modality Gap. In: Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS'24), Vancouver, Canada, 2024.

[3] Xiangyu Wu, Qingyuan Jiang, Yifeng Wu, Qingguo Chen, Yang Yang*, and Jianfeng Lu. TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI'24), Jeju Island, South Korea, 2024.

[4] Yang Yang, Jingshuai Zhang, Fan Gao, Xiaoru Gao, Hengshu Zhu*. DOMFN: A Divergence-Orientated Multi-Modal Fusion Network for Resume Assessment. In: Proceedings of the 30th ACM International Conference on Multimedia (ACM MM'22) , Lisbon, Portugal, 2022.

[5] Fengqiang Wan, Xiangyu Wu, Zhihao Guan, Yang Yang*. CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Relations for Vision-Language Retrieval. In: IEEE International Conference on Multimedia and Expo (ICME'24), Niagra Falls, Canada, 2024.

报告嘉宾：王思为 (智能博弈与决策实验室)

报告时间：2025年2月19日 (星期三)晚上21:00 (北京时间)

报告题目：基于复杂性视角的多视图学习：协同与自适应选择

报告人简介：

王思为目前是智能博弈与决策实验室助理研究员。主要研究方向为大规模多模态数据分析、大模型多Agent等。于NeurIPS、ICML、CVPR、ICCV、IEEE TPAMI、TIP、TKDE等人工智能顶级会议和期刊发表论文20余篇，学术引用4000余次，4篇ESI高被引论文。担任CCF-A类会议NeurIPS、ICML、ICLR、CVPR、ICCV、AAAI、IJCAI、ACMMM领域主席和一区期刊Pattern Recognition编委，主持、参与多项科技委项目、科技部项目、国家自然科学基金项目。

个人主页：

https://wangsiwei2010.github.io/

报告摘要：

当前，多视图学习广泛地被应用在机器学习、数据挖掘等相关领域并取得了成功。相比较于单视图学习模型，多视图学习意图通过视图之间的协同实现任务表现上的涌现。然而，现有方法大多侧重于学习模型构建，忽视了多视图学习上视图关联交互的复杂性。本报告拟基于复杂性科学和多视图学习的多学科交叉，汇报课题组最近几年针对多视图协同、关联学习、自适应选择等方面在复杂多视图体系学习的探索，最后将简要展望相关方向。

参考文献：

[1] Fangdi Wang，Jiaqi Jin，Jingtao Hu，Suyuan Liu，Xihong Yang，Siwei Wang*, Xinwang Liu*，En Zhu, "Evaluate then Cooperate: Shapley-based View Cooperation Enhancement for Multi-view Clustering " The Thirty-eighth Annual Conference on Neural Information Processing (NeurIPS'2024), .

[2] Suyuan Liu, Ke Liang, Zhibin Dong, Siwei Wang, Xihong Yang, Sihang Zhou, En Zhu, Xinwang Liu, "Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering,"In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR' 2024), Seattle, America, June, 2024.

[3] Zhibin Dong, Siwei Wang, Jiaqi Jin, Xinwang Liu*, En Zhu*, “Cross-view topology based consistent and complementary information for deep multi-view clustering,” in Proc. IEEE International Conf. on Computer Vision (ICCV'2023), Paris, France, October, 2023.

[4] Fengqiang Wan, Xiangyu Wu, Zhihao Guan, Yang Yang*. CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Relations for Vision-Language Retrieval. In: IEEE International Conference on Multimedia and Expo (ICME'24), Niagra Falls, Canada, 2024.

Panel嘉宾：刘新旺 (国防科技大学)

嘉宾简介：

刘新旺，国防科技大学计算机学院教授，博士生导师。国家杰青、优青获得者。主要研究兴趣包括机器学习、数据挖掘等。近五年以第一或通讯作者在CCF A类顶刊和顶会上发表论文110余篇，包括IEEE TPAMI论文15篇，含3篇独立作者。ESI高被引论文12篇。谷歌学术引用两万余次，入选2022-2024年度全球2%顶尖科学家榜单。担任IEEE TNNLS、IEEE TCYB、Information Fusion等期刊AE及ICML、NeurIPS等顶会的资深程序委员/领域主席。部分研究成果曾两次获得湖南省自然科学一等奖 (2/6、6/6)、吴文俊人工智能自然科学一等奖 (1/5)、湖北省自然科学奖三等奖 (2/4)以及中国图象图形学会自然科学二等奖 (2/5)。

个人主页：

https://xinwangliu.github.io/

Panel嘉宾：傅建龙 (微软亚洲研究院)

嘉宾简介：

Dr. Jianlong Fu is a Senior Research Manager in the Vision and Multimodal Computing Group at MSRA. His research interests include computer vision, multimedia analysis and robot learning. He specializes in multimodal generative AI, focusing on multimedia content creation and perceptual computing for images, videos, and embodied agents. He was honored as one of the 2023 China Intelligent Computing Innovators by MIT Technology Review, recognized as 2024 IEEE Distinguished Lecturers, and has received four Best Paper Awards at ACM Multimedia and related conferences. He serves on the editorial boards of ACM TOMM, IEEE TMM, IEEE CEM, and was the guest editor for IEEE T-PAMI. His innovations have contributed to core technologies in Windows, Office, Azure, Bing, and Edge.

主持人：胡迪 (中国人民大学)

主持人简介：

胡迪，现任中国人民大学高瓴人工智能学院副教授，博导。主要研究方向为机器多模态感知、交互与学习，以主要作者在TPAMI/ICML/CVPR/CoRL等人工智能顶级期刊及会议发表论文40余篇，代表性工作如视音指代分割与问答；平衡多模态学习理论，机制与方法；面向物体操纵的动态多模态交互算法等。作为副主编出版本科教材一部。曾入选 CVPR Doctoral Consortium；荣获2020中国人工智能学会优博奖；荣获2022年度吴文俊人工智能优秀青年奖；入选第七届中国科协青托计划等。担任AAAI、IJCAI Senior PC等，主办/协办多场国际顶级会议的多模态学习讲习班 (Tutorial)。

实验室主页：

https://gewu-lab.github.io/

特别鸣谢本次Webinar主要组织者：

主办AC：胡迪 (中国人民大学)

收藏邀请

上一篇：VALSE 2025大会 Workshop主题确定下一篇：VALSE 论文速览第207期：为图像问答任务构建好的上下文序列

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2026-3-19 12:58 , Processed in 0.014035 second(s), 14 queries .

返回顶部

VALSE Webinar 25-03期 总第374期 多模态学习是否实现了1+1>2？

相关分类

下级分类

VALSE Webinar 25-03期总第374期多模态学习是否实现了1+1>2？