20200916-23 源头活水：自监督与无监督学习探讨

2020-9-11 18:22| 发布者: 程一-计算所| 查看: 3289| 评论: 0

摘要: 报告时间2020年09月16日(星期三)晚上20:00(北京时间)主题源头活水：自监督与无监督学习探讨主持人姬艳丽(电子科技大学)报告嘉宾：谢伟迪(VGG, University of Oxford)报告题目：Self-supervised Visual Representati ...

报告时间	2020年09月16日(星期三) 晚上20:00(北京时间)
主题	源头活水：自监督与无监督学习探讨
主持人	姬艳丽(电子科技大学)

报告嘉宾：谢伟迪(VGG, University of Oxford)

报告题目：Self-supervised Visual Representation Learning from Videos

报告嘉宾：齐国君(华为美国研究所)

报告题目：变换共变性(Transformation Equvariance)与变换不变性(Transformation Invariance)原理在特征学习的中心作用：一种无监督深度学习的统一视角

Panel嘉宾：

谢伟迪(VGG, University of Oxford)、齐国君(华为美国研究所)、宫辰(南京理工大学)、武智融(微软亚洲研究院)、左旺孟(哈尔滨工业大学)

Panel议题：

1. 能否结合自己的经验和体会，就自监督学习研究中如何针对特定的下游(Downstreaming)任务设计恰当的上游(Pre-text)任务给大家一些建议？

2. 自监督学习与之前的无监督学习方法如聚类、生成式对抗网络的区别与联系，他们之间是否存在结合的可能性？

3. 自监督学习及无监督学习方法中目前比较值得关注的研究角度和方向有哪些？

4. 基于图像的自监督学习方法是否会受限于自身的局限性？3D、视频和多模态(如音视频、视觉语言)自监督是否更有优势？

5. 相对于自监督学习在自然语言中的巨大的成功，视觉自监督如何能取得更大的突破？

6. 如何定义和形成一些自监督学习的Benchmark？

7. 自监督学习及无监督学习当前研究是否有痛点？解决的可能措施有哪些？

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：谢伟迪(VGG, University of Oxford)

报告时间：2020年09月16日(星期三)晚上20:00(北京时间)

报告题目：Self-supervised Visual Representation Learning from Videos

报告人简介：

Weidi Xie is a research fellow at Visual Geometry Group(VGG), University of Oxford, where he mainly works on computer vision and biomedical image analysis, for example, ultrasound image analysis, cardiac MRI, face recognition, speaker recognition, objects counting, video self-supervised learning, image retrieval, etc. In 2018, he completed DPhil at VGG, advised by Professor Andrew Zisserman and Professor Alison Noble.He was a recipient of Magdalen Award from China Oxford Scholarship Fund(COSF). He was a recipient of Oxford-Google DeepMind Graduate Scholarships. He was a recipient of the Excellence Award from University of Oxford.

个人主页：

https://weidixie.github.io/weidi-personal-webpage/

Google Scholar:https://scholar.google.co.uk/citations?user=Vtrqj4gAAAAJ&hl=en

报告摘要：

Recent methods based on self-supervised learning have shown remarkable progress and are now able to build features that are competitive with features built through supervised learning. However, the research focus is on learning transferable representations from i.i.d data, e.g. images. To be really applicable, the networks are still required to finetune with manual annotations on downstream tasks, which is always not satisfactory.

In this talk, I will cover self-supervised visual representation learning from videos, and explain why I think videos are the perfect data source for self-supervised learning. Specifically, I will present our recent efforts in visual learning representation (from videos) that can benefit semantic downstream tasks, exploiting the rich information in videos, e.g. temporal information, motions, audios, narrations, spatial-temporal coherence, etc. Apart from evaluating the transferability, representation learned from videos are able to directly generalize to downstream tasks with zero annotations !

As a conclusion, I would like to summarize the shortcomings of our works and some preliminary thoughts on how they may be addressed to push the community forward.

参考文献：

[1] Relja Arandjelović, Andrew Zisserman, "Look, listen and learn". In Proc. ICCV, 2017.

[2] Relja Arandjelović, Andrew Zisserman, "Objects that sound". In Proc. ECCV, 2018.

[3] Aaron van den Oord, Yazhe Li, Oriol Vinyals, "Representation Learning with Contrastive Predictive Coding", arXiv:1807.03748, 2018.

[4] Tengda Han, Weidi Xie, Andrew Zisserman, “Video Representation Learning by Dense Predictive Coding”. In ICCVW 2019.

[5] Zihang Lai, Weidi Xie, “Self-supervised Learning for Video Correspondence Flow”. In Proc. BMVC, 2019.

[6] Tengda Han, Weidi Xie, Andrew Zisserman, “Memory-augmented Dense Predictive Coding for Video Representation Learning”. In Proc. ECCV 2020.

[7] Zihang Lai, Erika Lu, Weidi Xie, “MAST: A Memory-augmented Self-supervised Tracker”. In Proc. CVPR, 2020.

[8] Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman, “VGG-Sound: A Large-scale Audio-Visual Dataset”. In ICASSP, 2020.

[9] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick, "Momentum Contrast for Unsupervised Visual Representation Learning". In Proc. CVPR2020.

[10] Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton, "A Simple Framework for Contrastive Learning of Visual Representations". In ICML2020.

报告嘉宾：齐国君(华为美国研究所)

报告时间：2020年09月16日(星期三)晚上20:40(北京时间)

报告题目：变换共变性(Transformation Equvariance)与变换不变性(Transformation Invariance)原理在特征学习的中心作用：一种无监督深度学习的统一视角

报告人简介：

齐国君博士现就职华为美国研究所，任首席科学家，负责EI智能体的研发，包括智慧城市、卫星遥控地图服务、自动驾驶和视觉计算等云服务的研发。之前，齐国君博士曾就职于中佛罗里达大学计算机系，任MAPLE实验室主任。他的研究包括机器学习、多模态感知与知识发现，并用于开发智能、可靠的决策系统。齐国君博士发表论文在高水平学术会议上发表论文100余篇，并获得过ACM Multimedia的最佳论文(2007)和最佳论文候选奖(2015)。他的论文迄今被引用近万次。齐国君博士担任ACM Multimedia 2020, ICIMCS 2018, MMM 2016的程序委员会主席，并作为 ICCV, ICPR, ICIP, ACM SIGKDD, ACM CIKM, AAAI, IJCAI, and ACM Multimedia等会议的领域主席/高级程序委员会委员。同时他担任多个国际著名期刊的副编辑，包括IEEE Transactions on Image Processing(T-IP), IEEE Transactions on Circuits and Systems for Video Technology(T-CSVT), IEEE Transactions on Multimedia(T-MM), ACM Transactions on Knowledge Discovery from Data(T-KDD), and Elsevier Journals on Pattern Recognition(PR)等。

个人主页：

http://www.cs.ucf.edu/~gqi/

报告摘要：

在本报告中，我将介绍一种新型的深度学习设计的理论、方法和思路：首先，基于变换共变性原理的特征学习的一整套完整的算法。它大大扩展了支撑经典的卷积神经网络成功的平移同变性理论，通过自编码变换(Auto-Encoding Transformations)模型[1]得到在一般变换下适用的同变模型和算法。通过进一步推导得到的概率模型和相应的变分方法[2]，通过最大化特征表达与变换之间的互信息，可以证明得到的变换同变性网络进一步推广了线性的变换群表示理论，使得得到的特征表达可以更好地对复杂的非线性视觉结构进行建模。我们进一步将这个特征表达模型推广到半监督的情形下，并将其应用到图像分类与检测[3]、乃至图神经网络(Graph Neural Networks)训练问题中。相关的实验结果证明了基于变换不变性原理的特征学习方法具有的极大优势。最后，我将展望统一变换共变性与变换不变性(如Contrastive Learning)原理的可能性，并探讨如何在未来的深度学习模型中发展出相应的理论和方法[3, 5]。相关代码我们已在[6]上开源。

参考文献：

[1] Liheng Zhang, Guo-Jun Qi*, Liqiang Wang, Jiebo Luo. AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2019), Long Beach, CA, June 16th-June 20th, 2019.

[2] Guo-Jun Qi*, Liheng Zhang, Chang Wen Chen, Qi Tian. AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations, in Proceedings of International Conference in Computer Vision(ICCV 2019), Seoul, Kore, Oct. 27-Nov. 2, 2019.

[3] Qi, Guo-Jun, Liheng Zhang, and Xiao Wang. "Learning generalized transformation equivariant representations via autoencoding transformations." arXiv preprint arXiv:1906.08628(2019).

[4] Xiang Gao, Wei Hu, Guo-Jun Qi. GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-wise Transformations, in Proceedings of IEEE/CVF Conferences on Computer Vision and Pattern Recognition(CVPR 2020), Seattle, WA, June 14th-June 19th, 2020.

[5] Qi, Guo-Jun, and Jiebo Luo. "Small data challenges in big data era: A survey of recent progress on unsupervised and semi-supervised methods." arXiv preprint arXiv:1903.11260(2019).

[6] https://github.com/maple-research-lab.

Panel嘉宾：宫辰(南京理工大学)

嘉宾简介：

宫辰，现任南京理工大学计算机科学与工程学院教授、博导。获上海交通大学和悉尼科技大学双博士学位。其研究方向主要为弱监督机器学习及应用。已在世界权威期刊或会议上发表90余篇学术论文，主要包括IEEE T-PAMI, IEEE T-NNLS, IEEE T-IP, ICML， NeurIPS, CVPR, AAAI, IJCAI等。目前担任AIJ、JMLR、IEEE T-PAMI、IEEE T-NNLS、IEEE T-IP、IEEE T-KDE等20余家国际权威期刊审稿人，并受邀担任ICML、NeurIPS、IJCAI、AAAI、ICDM等多个国际会议的(S)PC member。曾获中国科协“青年人才托举工程”、吴文俊人工智能优秀青年奖、中国人工智能学会“优秀博士学位论文”奖、上海交通大学“优秀博士学位论文”奖、江苏省“六大人才高峰”、江苏省“双创博士”等。

个人主页：

https://gcatnjust.github.io/ChenGong/index.html