VALSE

VALSE 首页 活动通知 查看内容

VALSE Webinar 20231025-29期 总第329期 面向视觉的零样本学习

2023-10-19 17:28| 发布者: 程一-计算所| 查看: 1053| 评论: 0

摘要: 报告嘉宾:李晶晶 (电子科技大学)报告题目:基于生成模型的零样本视觉识别报告嘉宾:周锴阳 (港浸会大学)报告题目:Prompting—The New API for Interactive Visual IntelligencePanel议题:1.经典的零样本学习方法 ...

报告嘉宾:李晶晶 (电子科技大学)

报告题目:基于生成模型的零样本视觉识别


报告嘉宾:周锴阳 (港浸会大学)

报告题目:Prompting—The New API for Interactive Visual Intelligence


Panel议题:

1. 经典的零样本学习方法和基于大规模视觉-语言模型的零样本学习方法 (例如CLIP)的具有各自优劣势,未来方向可否同时兼顾两种方法的优势?未来面向视觉的零样本学习的路在何方?

2. LLM具有很强的推理和丰富的知识,LLM能否推进面向视觉的零样本学习?有哪些潜在的研究价值和方向?

3. 当前零样本学习仍是极具挑战性的研究课题,离工业落地具有一定的距离。当前是否有必要结合实际的业界应用场景推进零样本学习研究?

4. 当前大模型的涌现,学术界应该如何面对工业界靠数据、计算资源取胜的局面?除了靠数据和计算资源,还有什么可以推进一个领域的发展?


Panel嘉宾:

李晶晶 (电子科技大学)、周锴阳 (港浸会大学)、先勇钦 (Google)、冀中 (天津大学)、陈使明 (卡耐基梅隆大学/阿联酋人工智能大学)


报告嘉宾:李晶晶 (电子科技大学)

报告时间:2023年10月25日 (星期三)晚上20:00 (北京时间)

报告题目:基于生成模型的零样本视觉识别


报告人简介:

李晶晶博士,电子科技大学计算机学院教授,博导,人社部“博新计划”博士后。中国电子学会优博。入选2019年电子科技大学“学术新人奖”,2020年电子科技大学“百人计划”。主要研究方向为人工智能算法及应用。目前已在TPAMI、TIP、TKDE、TOIS和CVPR等JCR一区期刊及CCF A类会议上发表长文七十余篇,获得授权专利十项。担任TPAMI、TIP、TCYB、TNNLS、TKDE、CVPR、AAAI、MM等期刊和会议审稿人/领域主席/高级程序委员/程序委员。研究成果入选ESI高被引热点,以及中国百篇最具影听国际学术论文。成果在亿纬锂能和腾讯等单位落地应用,产生数亿经济价值。荣获四川省科技进步一等奖和吴文俊人工智能优秀青年奖。


个人主页:

https://faculty.uestc.edu.cn/jjl


报告摘要:

神经网络的识别通常被限制在训练数据中预定义的类别上。零样本学习旨在突破这个限制,实现对训练中未见类别的识别。本报告全面探讨了使用生成模型实现零样本视觉识别的方法。报告首先介绍了零样本学习的目标,以及使用语义信息建立语义空间和视觉空间之间的映射,从而实现对未见类别的识别。接着,报告分析了零样本识别问题中的几个关键挑战,如生成的欠约束性、生成样本的单一性、语义成分重要性、特征纠缠性、特征生成质量等。为了解决这些问题,本报告介绍了多种应对的方法,例如使用循环一致的对抗生成网络;使用soul样本正则化机制增加生成多样性;使用集成模型赋予不同语义成分不同权重;使用语义解耦学习区分相关与非相关语义特征;调节生成跨度维持语义一致性和类别多样性。最后,报告介绍了通过对抗样本增强数据,从而提高模型对抗鲁棒性的方法。


参考文献:

[1] Z. Chen, P. Zhang, J. Li, S. Wang, Z Huang, “Zero-Shot Learning by Harnessing Adversarial Samples”,  Proceedings of the 31th ACM International Conference on Multimedia, 2023

[2] Z. Chen, Y. Luo, R. Qiu, S. Wang, J. Li, Z. Huang, “Semantics Disentangling for Generalized Zero-Shot Learning”, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2021,

[3] Sariyildiz M B, Cinbis R G. Gradient matching generative networks for zero-shot learning, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019,

[4] Chen S, Hong Z, Xie G S, et al. Msdn: Mutually semantic distillation network for zero-shot learning, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022,

[5] Liu Z, Guo S, Lu X, et al. $^ 2$ P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023,

[6] Xu W, Xian Y, Wang J, et al. Vgse: Visually-grounded semantic embeddings for zero-shot learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,

[7] Khan M G Z A, Naeem M F, Van Gool L, et al. Learning attention propagation for compositional zero-shot learning, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023,

[8] J. Li, M. Jing, K. Lu, Z. Ding, L. Zhu, Z. Huang, “Leveraging the Invariant Side of Generative Zero-Shot Learning”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.


报告嘉宾:周锴阳 (港浸会大学)

报告时间:2023年10月25日 (星期三)晚上20:40 (北京时间)

报告题目:Prompting—The New API for Interactive Visual Intelligence


报告人简介:

Dr. Kaiyang Zhou is an Assistant Professor at the Department of Computer Science, Hong Kong Baptist University. He was previously a postdoc at Nanyang Technological University, Singapore. He received his PhD in computer science from the University of Surrey, UK. His research lies at the intersection of computer vision and machine learning, and has been published at top-tier journals and conferences in relevant fields such as TPAMI, IJCV, and CVPR. He is an associate editor of the International Journal of Computer Vision (IJCV), and the creator of the popular open-source software Torchreid for person re-identification.


个人主页:

https://kaiyangzhou.github.io/


报告摘要:

Originating from natural language processing, the new paradigm of prompting has recently swept through the computer vision community, bringing disruptive changes to various computer vision applications, such as image recognition and image generation. In comparison to the traditional fixed-once-learned architecture, like a linear classifier trained to recognize a specific set of categories, prompting offers greater flexibility and more opportunities for novel applications. This is because prompting allows the model to perform new tasks, such as recognizing new categories, by tuning textual instructions or modifying a small number of parameters in the model's input space while keeping most of the pre-trained parameters untouched. This paradigm significantly pushes human-AI interaction to unprecedented levels. In this talk, I will discuss how to use prompt learning to adapt large vision-language models for generic visual perception tasks including image classification and object detection.


参考文献:

[1] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to Prompt for Vision-Language Models. International Journal of Computer Vision (IJCV), 130:2337–2348, 2022.

[2] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional Prompt Learning for Vision-Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 16816–16825, 2022.


Panel嘉宾:先勇钦 (Google)


嘉宾简介:

Yongqin Xian is a research scientist at Google Zurich. Prior to that, he was a post-doctoral researcher with Prof. Luc Van Gool in the Computer Vision Lab at ETH Zurich. He completed his PhD (summa cum laude) at the Max Planck Institute Informatics under the supervision of Prof. Bernt Schiele and Prof. Zeynep Akata. His research focuses on vision-language model pretraining and its applications in computer vision tasks. He is the recipient of ECVA PhD Award, Chinese Government Award for Outstanding Self-Financed Students Abroad, Qualcomm Innovation Fellowship Finalist and 3DV Best Paper honorable Mention Award.


个人主页:

https://xianyongqin.github.io


Panel嘉宾:冀中 (天津大学)


嘉宾简介:

冀中,天津大学教授,博导,IEEE和CCF高级会员,天津市类脑智能技术重点实验室副主任,主要研究方向为零/少样本学习、持续学习、跨模态学习。研究受到科技部科技创新2030人工智能重大项目课题,国家自然科学基金等项目的资助。以第一作者或通讯作者在知名国内外期刊和会议发表论文100余篇,以第一发明人授权发明专利60余项。现任若干期刊的编委以及会议的领域主席、技术委员会高级委员或委员。


个人主页:

http://seea.tju.edu.cn/info/1014/1447.htm


主持人:陈使明 (卡耐基梅隆大学/阿联酋人工智能大学)


主持人简介:

Shiming Chen is a Postdoctoral Research Fellow in CMU and MBZUAI. Prior to that, he received his Ph.D degree at Huazhong University of Science and Technology in 2022, advised by Prof. Xinge You. His current research interests includes zero-shot learning, generative modeling and learning, visual-and-language learning. As the first-author, he have expounded his research results in 10+ publications at top-tier conferences/ journals, such as TPAMI/ NeurIPS/ ICML/ CVPR/ ICCV/ AAAI. He serves as the reviewer for some prestigious journals (e.g., TPAMI/ IJCV/ TIP), and the reviewer for the top-tier conferences (e.g., ICLR/ NeurIPS/ ICML/ ICCV/ CVPR). He is also the AC of VALSE and PRCV’23.


个人主页:

https://shiming-chen.github.io/



特别鸣谢本次Webinar主要组织者:

主办AC:陈使明 (卡耐基梅隆大学/阿联酋人工智能大学)


活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行,欢迎在B站搜索VALSE_Webinar关注我们!

直播地址:

https://live.bilibili.com/22300737;

历史视频观看地址:

https://space.bilibili.com/562085182/ 


2、VALSE Webinar活动通常每周三晚上20:00进行,但偶尔会因为讲者时区问题略有调整,为方便您参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ S群,群号:317920537);


*注:申请加入VALSE QQ群时需验证姓名、单位和身份缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。


3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。


4、您也可以通过访问VALSE主页:http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT(经讲者允许后),会在VALSE官网每期报告通知的最下方更新。

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2024-11-21 14:59 , Processed in 0.015960 second(s), 14 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

返回顶部