报告嘉宾:仉尚航 (北京大学) 报告题目:迈向开放世界自动驾驶泛化感知 报告嘉宾:Haohan WANG (University of Illinois Urbana-Champaign) 报告题目:Guardian of Trust in Language Models: Automatic Jailbreak and Systematic Defense Panel嘉宾: 仉尚航 (北京大学),Haohan WANG (University of Illinois Urbana-Champaign),马雷 (The University of Tokyo/University of Alberta),俞刚 (腾讯) Panel议题: 1. 如何挖掘大模型在开放环境中的能力以及面对Corner Case问题的创新性解决方案,以应对大模型在极端情境下的挑战? 2. 如何解决大模型在不同任务中出现的幻象 (Hallucination)问题,例如图像生成、文本生成、多模态任务等? 3. 在有特定安全需求的场景下 (如自动驾驶,智慧医疗等),如何更好的设计防御机制对抗大模型的“越狱”攻击? 4. 大模型中,鲁棒性 (泛化能力)和安全性、隐私性之间是否可以平衡? 5. 基于视觉与多模态可信大模型未来的发展趋势和方向的看法? *欢迎大家在下方留言提出主题相关问题,主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题! 报告嘉宾:仉尚航 (北京大学) 报告时间:2024年1月3日 (星期三)晚上20:00 (北京时间) 报告题目:迈向开放世界自动驾驶泛化感知 报告人简介: 仉尚航,北京大学计算机学院研究员、博士生导师、博雅青年学者。致力于开放环境泛化机器学习理论与系统研究,取得了一系列重要的研究成果,在人工智能顶级期刊和会议上发表论文60余篇,Google Scholar引用数7000余次。荣获世界人工智能顶级会议AAAI’2021 最佳论文奖,曾位列世界最大学术源代码仓库Trending Research第一位。作为编辑和作者由Springer Nature出版英文书籍《Deep Reinforcement Learning》,至今电子版全球下载量超十八万次,入选中国作者年度高影响力研究精选。于2018年入选美国“EECS Rising Star”,于2023年入选“全球AI华人女性青年学者榜”、“中国科协青年百人会”。曾获得2022年度CCF-百度松果基金、CCF-滴滴盖亚青年学者科研基金、国际人脑多模态计算模型响应预测竞赛第一名,ICCV持续泛化学习竞赛第一名。曾多次在国际顶级会议NeurIPS、ICML上组织Workshop,担任AAAI 2022&2023&2024 高级程序委员。仉尚航于2018年博士毕业于美国卡内基梅隆大学,并于加州大学伯克利分校从事博士后研究。 个人主页: https://www.shanghangzhang.com/ 报告摘要: 虽然机器视觉为各个领域带来巨大成功,但已有机器视觉往往针对封闭环境,存在闭集假设和大样本假设等局限。现实世界中的自动驾驶往往面对开放环境,存在以下关键挑战:1)开放环境中存在大量数据域偏移,已有方案难以适应新数据域、对新场景进行准确理解;2)开放环境中新的类别动态出现,无法及时获得标注,已有方案难以在少量标注下准确识别新事物。本次分享将针对上述挑战,介绍一系列增强开放世界自动驾驶感知的泛化能力,使其自动适应新环境、识别新事物的研究工作。尤其针对自动驾驶的Corner Case问题提出新型持续泛化学习范式和预训练大模型解决方案。 报告嘉宾:Haohan WANG (University of Illinois Urbana-Champaign) 报告时间:2024年1月3日 (星期三)晚上20:30 (北京时间) 报告题目:Guardian of Trust in Language Models: Automatic Jailbreak and Systematic Defense 报告人简介: Haohan Wang is an assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. His research focuses on the development of trustworthy machine learning methods for computational biology and healthcare applications. In his work, he uses statistical analysis and deep learning methods, with an emphasis on data analysis using methods least influenced by spurious signals. Wang earned his PhD in computer science through the Language Technologies Institute of Carnegie Mellon University. He is also an organizer of Trustworthy Machine Learning Initiative. 个人主页: https://haohanwang.github.io/ 报告摘要: Large Language Models (LLMs) excel in Natural Language Processing (NLP) with human-like text generation, but the misuse of them has raised a significant concern. In this talk, we introduce an innovative system designed to address these challenges. Our system leverages LLMs to play different roles, simulating various user personas to generate "jailbreaks" – prompts that can induce LLMs to produce outputs contrary to ethical standards or specific guidelines. Utilizing a knowledge graph, our method efficiently creates new jailbreaks, testing the LLMs' adherence to governmental and ethical guidelines. Empirical validation on diverse models, including Vicuna-13B, LongChat-7B, Llama-2-7B, and ChatGPT, has demonstrated its efficacy. The system's application extends to Visual Language Models, highlighting its versatility in multimodal contexts. The second part of our talk shifts focus to defensive strategies against such jailbreaks. Recent studies have uncovered various attacks that can manipulate LLMs, including manual and gradient-based jailbreaks. Our work delves into the development of robust prompt optimization as a novel defense mechanism, inspired from principled solutions from trustworthy machine learning. This approach involves system prompts – parts of the input text inaccessible to users – and aims to counter both manual and gradient-based attacks effectively. Despite current methods, adaptive attacks like GCG remain a challenge, necessitating a formalized defensive objective. Our research proposes such an objective and demonstrates how robust prompt optimization can enhance the safety of LLMs, safeguarding against realistic threat models and adaptive attacks. Panel嘉宾:马雷 (The University of Tokyo/University of Alberta) 嘉宾简介: Prof. Lei Ma is currently an associate professor with The University of Tokyo, as well as University of Alberta (with a shared appointment in part time from April 2023), leading Momentum Lab (website to be launched). Previously, he held the assistant professor position with Kyushu University from Jan. 2019, and was promoted to associate professor on April 2020. From April 2021, he joined University of Alberta and was honored to be selected as a Canada CIFAR AI Chair, and a Fellow with Alberta Machine Intelligent Institute (Amii), under the Pan-Canadian AI Strategy. From April 2023, he joined The computer science department of The University of Tokyo as an associate professor. His research spans a wide range of research topics, and comes with a special focus centering on the interdisciplinary research fields of Software Engineering and Artificial Intelligence, in the design and development of quality assurance and engineering support for building trustworthy AI systems. Prof. Ma received a B.E. degree from Shanghai Jiaotong University (SJTU) in 2009, M.E. and Ph.D. degrees from The University of Tokyo, in 2011 and 2014, respectively. 个人主页: https://www.malei.org/ Panel嘉宾:俞刚 (腾讯有限公司) 嘉宾简介: 俞刚, 博士毕业于新加坡南洋理工大学, 现为腾讯有限公司算法研究员。主要研究方向包括AIGC生成以及图文大模型等计算机视觉相关技术。谷歌学术引用次数超过15000次。 个人主页: https://www.skicyyu.org/ 主持人:李皓亮 (香港城市大学) 主持人简介: Dr. Haoliang Li received his Ph.D. degree from Nanyang Technological University (NTU), Singapore in 2018. He is currently an assistant professor in Department of Electrical Engineering, City University of Hong Kong. His research mainly focuses on AI security, multimedia forensics and transfer learning. He received the Wallenberg-NTU presidential postdoc fellowship in 2019, doctoral innovation award in 2019, VCIP best paper award in 2020, and Stanford's top 2% most highly cited scientists in 2022 and 2023. 特别鸣谢本次Webinar主要组织者: 主办AC:李皓亮 (香港城市大学) |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2024-11-26 14:53 , Processed in 0.012964 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.