VALSE Webinar 20220629-16期总第283期比物连类：对比表示学习

2022-7-12 16:44| 发布者: 程一-计算所| 查看: 1532| 评论: 0

摘要: 报告时间2022年06月29日 (星期三)晚上20:00 (北京时间)主题比物连类：对比表示学习主持人刘宇 (大连理工大学)直播地址https://live.bilibili.com/22300737报告嘉宾：宫明明 (墨尔本大学)报告题目：CRIS: CLIP-Drive ...

报告时间	2022年06月29日 (星期三) 晚上20:00 (北京时间)
主题	比物连类：对比表示学习
主持人	刘宇 (大连理工大学)
直播地址	https://live.bilibili.com/22300737

报告嘉宾：宫明明 (墨尔本大学)

报告题目：CRIS: CLIP-Driven Referring Image Segmentation

报告嘉宾：苏冰 (中国人民大学)

报告题目：What to contrast?

Panel嘉宾：

宫明明 (墨尔本大学)、苏冰 (中国人民大学)、曹越 (微软亚洲研究院)、刘同亮 (悉尼大学)、胡鹏 (四川大学)

Panel议题：

1. 对比学习的成功涉及很多技术细节，包括数据增强，负样本，动量编码器，projection head等等，如何可以化繁为简，并取得不错的效果？

2. 目前对比学习多用于image-level分类任务，未来会不会更多的应用在pixel-level的下游任务？可能带来哪些优势？

3. 对比学习可以使用labels吗？与metric learning相比，主要区别在哪里？

4. 好的对比学习系统应该满足什么条件呢？或者说如何更好的评价对比学习的效果？

5. 未来几年，对比学习和其他方法的结合，有哪些发展趋势？

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：宫明明 (墨尔本大学)

报告时间：2022年06月29日 (星期三)晚上20:00 (北京时间)

报告题目：CRIS: CLIP-Driven Referring Image Segmentation

报告人简介：

Mingming Gong is a lecturer and PhD supervisor at the School of Mathematics and Statistics, University of Melbourne, Australia, and a principal investigator at the Melbourne Centre for Data Science. He received his PhD from the University of Technology Sydney in 2017 and then did postdoctoral research at the University of Pittsburgh and Carnegie Mellon University. His research interests include causal machine learning, weakly supervised/ self-supervised learning, transfer learning, generative models, and 3D vision. He has published more than 50 papers in top conferences and journals related to artificial intelligence, such as NeurIPS, ICML, and CVPR. He is a recipient of the Australian Research Council Discovery Early Career Award in 2021. He is area chairs of top machine learning conferences such as NeurIPS, ICML, and ICLR.

个人主页：

https://mingming-gong.github.io/

报告摘要：

Referring image segmentation aims to segment a referent via a natural linguistic expression. Due to the distinct data properties between text and image, it is challenging for a network to well align text and pixel-level features. Existing approaches use pretrained models to facilitate learning, yet separately transfer the language/ vision knowledge from pretrained models, ignoring the multi-modal corresponding information. Inspired by the recent advance in Contrastive Language-Image Pretraining (CLIP), in this paper, we propose an end-to-end CLIP-Driven Referring Image Segmentation framework (CRIS). To transfer the multi-modal knowledge effectively, CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. More specifically, we design a vision-language decoder to propagate fine-grained semantic information from textual representations to each pixel-level activation, which promotes consistency between the two modalities. In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances. The experimental results on three benchmark datasets demonstrate that our proposed framework significantly outperforms the state-of-the-art methods without any post-processing.

报告嘉宾：苏冰 (中国人民大学)

报告时间：2022年06月29日 (星期三)晚上20:30 (北京时间)

报告题目：What to contrast?

报告人简介：

苏冰，中国人民大学高瓴人工智能学院准聘副教授，2016年博士毕业于清华大学，2010年本科毕业于北京理工大学，2016至2020年在中科院软件所工作，研究方向是计算机视觉、机器学习、模式识别，以第一作者在TPAMI, TIP, ICML, CVPR, ICCV等CCF A类期刊和会议上发表论文十余篇。

个人主页：

http://ai.ruc.edu.cn/student/tutorGroup/0eed2b8a0011426ba6658b7a0e80901d.htm

报告摘要：

What matters for contrastive learning (CL)? We believe what to contrast is crucial. In the pixel level, we observe an ever-overlooked phenomenon, which reveals that backgrounds in images may interfere with the model learning semantic information. To tackle this issue, we model the background as a confounder and build a Structural Causal Model to perform causal intervention, which generates a weight matrix to eliminate the influence of backgrounds and force the model to focus on the foreground to contrast.

In the feature level, CL heavily relies on informative features, or “hard” features. Random augmentations cannot always add useful information. We propose to directly augment features in the latent space, thereby learning discriminative representations without a large amount of input data. We perform a meta learning technique to learn augmented features to contrast, where a new margin-injected regularization is added to avoid collapse. Finally, we show that CL can also be applied to contrast sequences with a learnable alignment-based distance to discover discriminative element pairs for contrast.

参考文献：

[1] Wenwen Qiang*, Jiangmeng Li*, Changwen Zheng, Bing Su, and Hui Xiong. “Interventional Contrastive Learning with Meta Semantic Regularizer”, International Conference on Machine Learning (ICML), 2022.

[2] Jiangmeng Li*, Wenwen Qiang*, Changwen Zheng, Bing Su, and Hui Xiong. “MetAug: Contrastive Learning via Meta Feature Augmentation”, International Conference on Machine Learning (ICML), 2022.

[3] Bing Su and Ji-Rong Wen, “Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification”, International Conference on Learning Representations (ICLR), 2022.

Panel嘉宾：曹越 (微软亚洲研究院)

嘉宾简介：

曹越，现任微软亚洲研究院视觉计算组主管研究员，分别于 2014 年和 2019 年在清华大学软件学院获得本科和博士学位，代表作有 Swin Transformer、GCNet 与 VL-BERT 等，曾于 2017 年获微软学者奖学金、2018 年获清华大学特等奖学金与林枫辅导员奖，2021 年获 ICCV 最佳论文奖—马尔奖。至今在 CVPR, ICCV, ICLR, ICML, NeurIPS 等国际顶级会议和期刊中发表论文 30 余篇，其中有四篇入围 PaperDigest Most Influential Papers 榜单，谷歌引用九千余次。目前主要的研究兴趣是自监督学习、多模态学习和 Transformer 建模。

个人主页：

http://yue-cao.me/

Panel嘉宾：刘同亮 (悉尼大学)

嘉宾简介：

刘同亮教授现任澳大利亚悉尼大学人工智能中心主任。主要从事可信机器学习及其在交叉领域的研究工作，特别是标签含噪学习、对抗学习、迁移学习和统计深度学习理论。目前已经在ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, AAAI, IJCAI, KDD, IEEE T-PAMI, T-NNLS, T-IP等会议/ 期刊发表论文100余篇。担任包括ICML, NeurIPS, ICLR, UAI, AAAI, IJCAI，KDD在内的多个顶级会议的领域主席、 TMLR 和MLJ编辑。获得澳大利亚研究委员会的优秀青年科研奖、悉尼大学工学院青年教职杰出研究奖、澳大利亚工程和计算机科学领域早期成就者奖等。

个人主页：

https://tongliang-liu.github.io/

Panel嘉宾：胡鹏 (四川大学)

嘉宾简介：

胡鹏，四川大学副研究员。2019年毕业于四川大学并获得博士学位。2019至2020年在新加坡信息通信研究所（I2R, A*STAR）担任研究员。主要研究方向包括表示学习基础理论及其在多媒体计算、视觉计算、图像处理等领域中的应用，目前在IEEE TPAMI, IEEE TIP, CVPR, NeurIPS, AAAI, SIGIR, ACM MM 等国际期刊和会议上发表论文20余篇。

个人主页：

https://penghu-cs.github.io/

主持人：刘宇 (大连理工大学)

主持人简介：

刘宇，大连理工大学，国际信息与软件学院，副教授。2018年博士毕业于荷兰莱顿大学，2019年至2020年在比利时鲁汶大学从事博士后研究。主要研究兴趣为多模态学习，连续学习及半监督迁移学习。在计算机视觉领域的主流期刊和会议上发表论文20余篇。在CVPR, ICCV, ECCV共同参与组织workshop 4次，曾获2017年国际多媒体建模会议 (MMM)最佳论文奖，ICCV 2021杰出审稿人，大连市新引进高层次人才 (青年才俊)。

个人主页：

https://liuyudut.github.io/

特别鸣谢本次Webinar主要组织者：

主办AC：刘宇 (大连理工大学)

协办AC：刘同亮 (悉尼大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ R群，群号：137634472）；