VALSE Webinar 20230802-19期总第319期跨模态学习驱动的三维理解与生成 ...

2023-7-27 10:03| 发布者: 程一-计算所| 查看: 715| 评论: 0

摘要: 报告嘉宾：齐晓娟 (香港大学)报告题目：Foundation Models as Scalable Data Servers for Open-World Understanding报告嘉宾：韩锴 (香港大学)报告题目：Text-and-Shape Guided 3D Human Avatar Generation via Diff ...

报告嘉宾：齐晓娟 (香港大学)

报告题目：Foundation Models as Scalable Data Servers for Open-World Understanding

报告嘉宾：韩锴 (香港大学)

报告题目：Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

报告嘉宾：高俊 (多伦多大学、NVIDIA)

报告题目：Towards High-Quality 3D Content Creation with Differentiable Isosurfacing

报告嘉宾：齐晓娟 (香港大学)

报告时间：2023年08月02日 (星期三)晚上20:00 (北京时间)

报告题目：Foundation Models as Scalable Data Servers for Open-World Understanding

报告人简介：

Qi Xiaojuan (https://xjqi.github.io) is an assistant professor in the Department of Electrical and Electronic Engineering at the University of Hong Kong. She received her Ph.D. from the Chinese University of Hong Kong and has worked and exchanged at the University of Toronto, Oxford University and Intel Visual Computing Group. She is committed to empowering machines with the ability to perceive, understand and reconstruct the visual world in the open world and pushing their deployments in embodied agents. She has published more than 60 papers in top computer vision and machine learning conferences such as CVPR, ICCV, NeurIPS, etc., many of which have been invited to give oral presentations. She has served/ will serve as area chair for ICCV 2021, CVPR 2021, AAAI 2021, AAAI 2022, CVPR 2023, NeurIPS 2023, and CVPR 2024.

个人主页：

https://xjqi.github.io/

报告摘要：

Deep learning has made remarkable progress in various computer vision tasks, but it often struggles to recognize objects or categories that are unseen or rarely seen (i.e., out-of-distribution) in the real open world. Advancing computer vision toward open-world environments is a critical and challenging problem, due to the long-tailed nature of real-world data, the large vocabulary space, and the domain shifts. The challenge is even more severe in the 3D domain. Data is a key factor for enabling models to have open-world capabilities. In this talk, I will present our recent attempts to leverage foundation models for generating data to address open-world challenges. First, I will share our empirical studies and findings on using text-to-image diffusion models to generate training data for image recognition. Then, I will discuss our recent efforts on using pre-trained vision-language models as efficient 3D annotators for open-vocabulary recognition. Finally, I will explore some future directions and prospects of open-world learning in the context of foundation models.

报告嘉宾：韩锴 (香港大学)

报告时间：2023年08月02日 (星期三)晚上20:35 (北京时间)

报告题目：Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

报告人简介：

Kai Han is an Assistant Professor in Department of Statistics and Actuarial Science at The University of Hong Kong, where he directs the Visual AI Lab. His research interests lie in Computer Vision and Deep Learning. His recent research focuses on open-world learning, 3D vision, foundation models and their relevant fields. Before joining HKU, he was a Researcher at Google Research, an Assistant Professor in Department of Computer Science at University of Bristol, and a Postdoctoral Researcher in the Visual Geometry Group (VGG) at the University of Oxford working with Prof. Andrew Zisserman and Prof. Andrea Vedaldi. He received his Ph.D. degree in Department of Computer Science at The University of Hong Kong advised by Prof. Kenneth K.Y. Wong. During his Ph.D., he also worked with Prof. Jean Ponce at the WILLOW team of Inria Paris and École Normale Supérieure (ENS) in Paris.

个人主页：

https://www.kaihan.org/

报告摘要：

In this talk, I will present our recent works on text-and-shape guided 3D human avatar generation. I will first introduce a framework for generating high-quality 3D human avatars with controllable poses, utilizing a dual space design and normal-consistency regularization to transfer optimized texture and geometry. Then, I will introduce a versatile coarse-to-fine pipeline for crafting 3D head avatars from textual prompts, addressing the challenges of 3D awareness and fine-grained editing.

报告嘉宾：高俊 (多伦多大学、NVIDA)

报告时间：2023年08月02日 (星期三)晚上21:10 (北京时间)

报告题目：Towards High-Quality 3D Content Creation with Differentiable Isosurfacing

报告人简介：

Jun Gao is a PhD student at the University of Toronto advised by Prof. Sanja Fidler. He is also Research Scientist at NVIDIA Toronto AI lab. His research interests focus on the intersection of 3D computer vision and computer graphics, particularly developing machine learning tools to facilitate large-scale and high-quality 3D content creation and drive real-world applications. Many of his contributions have been deployed in products, including NVIDIA Picasso, GANVerse3D, Neural DriveSim and Toronto Annotation Suite.

个人主页：

https://www.cs.toronto.edu/~jungao/

报告摘要：

With the increasing demand for creating large-scale 3D digital worlds across various industries, there is an immense need for diverse and high-quality 3D content. Machine learning is existentially enabling this quest. In this talk, I will discuss how exploiting the 3D modeling techniques from computer graphics, as well as the high-resolution 2D diffusion model is a promising avenue for this problem. To this end, I will first introduce a hybrid 3D representation employing marching tetrahedra that converts neural field-based representations into triangular meshes in a differentiable manner to facilitate efficient and flexible 3D modeling. By incorporating differentiable rendering, our representation effectively leverages Stable Diffusion as 2D data prior and moves a step towards generating high-quality 3D content from text prompts.

主持人：盛律 (北京航空航天大学)

主持人简介：

盛律，博士，北航“卓越百人”特别副研究员，博士生导师，入选北航青年拔尖人才支持计划。2017年获香港中文大学博士学位，同年加入香港中文大学MMLab从事博士后研究。2019年入职北京航空航天大学软件学院。主要研究方向是三维视觉，特别是面向三维点云的感知、理解和生成。已在TPAMI/ IJCV/ TIP和CVPR/ ICCV/ ECCV/ ACM MM/ AAAI等计算机视觉领域重要国际期刊和会议发表论文49篇，Google Scholar显示被引次数超过3200次。担任多个国际会议领域主席，多次组织国际会议研讨会和挑战赛；担任ACM Computer Surveys副编，长期担任计算机视觉、机器学习和多媒体重要期刊和会议的审稿人或程序委员。主持国家自然科学基金青年科学基金，国家自然科学基金重点项目课题和科技部重点研发计划课题等。

个人主页：

https://lucassheng.github.io/

主持人：李镇 (香港中文大学深圳)

主持人简介：

李镇博士现任香港中文大学 (深圳)理工学院/ 未来智联网络研究院助理教授，校长青年学者。李镇博士获得香港大学计算机科学博士学位 (2014-2018年)，他还于2018年在芝加哥大学担任访问学者。李镇博士荣获2021年中国科协第七届青年托举人才，CASP12接触图预测全球冠军，SemanticKITTI竞赛第一名，Urban3D竞赛2021第二名，Urban3D竞赛2022第三名。李镇博士还获得了来自于国家、省市级以及工业界的科研项目。李镇博士领导了港中深的Deep Bit Lab，其主要的研究方向是3D视觉解析及应用 (包括但不限于点云解析，多模态联合解析)，深度学习等基础理论算法研究，并致力于将2D/ 3D人工智能算法推广应用于蛋白/RNA结构预测，自动驾驶，工业视觉等场景中，在该方向著名国际期刊和会议发表论文50余篇，包括顶级期刊Cell Systems, Nature Communications, Bioinformatics, TMI, TNNLS等，以及顶级会议CVPR, ICCV, ECCV, NeurIPS, AAAI, IJCAI, MICCAI, Recomb等。李镇博士担任IEEE Transactions on Mobile Computing, IROS编委以及众多顶刊顶会的审稿人。

个人主页：

https://mypage.cuhk.edu.cn/academics/lizhen/

特别鸣谢本次Webinar主要组织者：

主办AC：盛律 (北京航空航天大学)

协办AC：李镇 (香港中文大学深圳)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ S群，群号：317920537）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。

4、您也可以通过访问VALSE主页：http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新。

收藏邀请

上一篇：VALSE Student Webinar 20230721-01期总第318期开放视觉感知下一篇：VALSE 论文速览第119期：基于随机归一化层聚合的对抗抵御方法 ...

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-10-16 05:53 , Processed in 0.014211 second(s), 14 queries .

返回顶部

VALSE Webinar 20230802-19期 总第319期 跨模态学习驱动的三维理解与生成 ...

相关分类

下级分类

VALSE Webinar 20230802-19期总第319期跨模态学习驱动的三维理解与生成 ...