VALSE Student Webinar 20240423-01期总第347期 3DGen:Versatile and Scalable

2024-4-19 13:21| 发布者: 程一-计算所| 查看: 1699| 评论: 0

摘要: 报告嘉宾：樊志文 (TheUniversityofTexasatAustin NVIDIA)报告题目：Streamlined3D/4D: From Hours to Seconds to Millisecond报告嘉宾：刘若石 (ColumbiaUniversity)报告题目：Fantastic 3D Generative Models and ...

报告嘉宾：樊志文 (The University of Texas at Austin NVIDIA)

报告题目：Streamlined 3D/4D: From Hours to Seconds to Millisecond

报告嘉宾：刘若石 (Columbia University)

报告题目：Fantastic 3D Generative Models and How to Use Them

报告嘉宾：樊志文 (The University of Texas at Austin NVIDIA)

报告时间：2024年4月23日 (星期二)晚上20:00 (北京时间)

报告题目：Streamlined 3D/4D: From Hours to Seconds to Millisecond

报告人简介：

Zhiwen Fan is a PhD candidate at UT Austin, advised by Prof. Atlas Wang. He was a recipient of the Qualcomm Innovation Fellowship 2022 in North America. He is also a research intern at NVIDIA research, has interned at Google and Meta and was previously a senior research engineer at Alibaba Group. He is interested in efficient 3D/4D, from modeling to perception to generation. He applies his work across a variety of applications, from autonomous driving and digital cities to generative AI and AI4space exploration.

个人主页：

http://zhiwenfan.github.io

报告摘要：

Traditional 3D and 4D modeling require modular and non-differentiable pipelines, such as Structure-from-Motion, to obtain camera poses and then apply sophisticated modeling representations for downstream view synthesis or surface reconstruction. However, dividing the reconstruction task into multiple discrete problems necessitates additional engineering effort for integration and may introduce cumulative errors. To address this, my research aims to design an end-to-end view synthesis framework that can dramatically reduce the typical large-scale reconstruction pipeline duration from hours to seconds. Additionally, my work explores the joint learning of reconstruction and perception tasks in a feed-forward manner and can generalize across different scenes. Finally, scene-scale 3D/4D generation from text prompts can be effectively addressed using our panoramic lifting techniques, which transform generated 2D omnidirectional panoramic images into panoramic Gaussian Splatting representations, facilitating efficient optimization and real-time exploration for AR/VR, robotics, and autonomous driving applications.

参考文献：

[1] Fan, Zhiwen, et al. "InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds." arXiv preprint arXiv:2403.20309 (2024).

[2] Fan, Zhiwen, et al. "Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps." arXiv preprint arXiv:2311.17245 (2023).

[3] Gu, Xiaodong, et al. "Cascade cost volume for high-resolution multi-view stereo and stereo matching." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[4] Fan, Zhiwen, et al. "Nerf-sos: Any-view self-supervised object segmentation on complex scenes." arXiv preprint arXiv:2209.08776 (2022).

[5] Zhou Shijie et al. "Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields." arXiv preprint arXiv: 2312.03203 (2023).

[6] Xu, Dejia, et al. "Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[7] Zhou, Shijie, et al. "DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting." arXiv preprint arXiv:2404.06903 (2024).

报告嘉宾：刘若石 (Columbia University)

报告时间：2024年4月23日 (星期二)晚上20:30 (北京时间)

报告题目：Fantastic 3D Generative Models and How to Use Them

报告人简介：

Ruoshi Liu is a 3rd-year PhD student at Columbia University, working with Carl Vondrick. His research interests include generative models, 3D computer vision, and robotics. He has published many papers at top conferences including CVPR, ICCV, NeurIPS, ICLR, RSS, etc.

个人主页：

http://ruoshiliu.github.io

报告摘要：

Generative models have made tremendous progress recently in fields across computer vision. Large-scale 3D datasets and powerful generative architectures have revealed tremendous potential to generate many components of our physical world in 3D. With a passion for both generative models and 3D computer vision, I have been trying to answer two key questions:

1. how do we build foundation models for 3D?

2. how do we use 3D generative models for real-world tasks?

In this talk, I will present our recent endeavors to address both questions and discuss potential future directions in both vision and robotics.

参考文献：

[1] Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick, “Zero-1-to-3: Zero-shot One Image to 3D Object ”, ICCV 2023

[2] Ruoshi Liu, Carl Vondrick, “Humans as Light Bulbs: 3D Human Reconstruction from Thermal Reflection ”, CVPR 2023

[3] Matt Deitke, Ruoshi Liu, et. Al., “Objaverse-XL: A Colossal Universe of 3D Objects ”, in submission, NeurIPS 2023

[4] Ruoshi Liu, Junbang Liang, Sruthi Sudhakar, Huy Ha, Cheng Chi, Shuran Song, Carl Vondrick, “PaperBot: Learning to Design Real-World Tools Using Paper”, in submission RSS 2024

[5] Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick, “pix2gestalt: Amodal Segmentation by Synthesizing Wholes ”, CVPR 2024

[6] Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng, “Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape ”, ICLR 2024

[7] Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick, “Shadows Shed Light on 3D Objects ”, published, CVPR 2023

主持人：魏云超 (北京交通大学)

主持人简介：

魏云超，北京交通大学计算机学院教授、副院长，国家高层次人才计划获得者。曾在新加坡国立大学、美国伊利诺伊大学厄巴纳-香槟分校、悉尼科技大学从事研究工作。入选MIT TR35 China、百度全球高潜力华人青年学者、《澳大利亚人》TOP 40 Rising Star；获世界互联网大会领先科技奖 (2023)、教育部高等学校自然科学奖一等奖 (2022)、中国图象图形学学会科技技术奖一等奖 (2019)、澳大利亚研究委员会青年研究奖 (2019)、IBM C3SR最佳研究奖 (2019)、计算机视觉世界杯ImageNet目标检测冠军 (2014)及多项CVPR竞赛冠军；发表TPAMI、CVPR等顶级期刊/会议论文100多篇，Google引用超20000次。目前主要研究方向包括面向非完美数据的视觉感知、多模态数据分析、生成式人工智能等。

个人主页：

https://weiyc.github.io/

特别鸣谢本次Webinar主要组织者：

主办AC：魏云超 (北京交通大学)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ S群，群号：317920537）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。

4、您也可以通过访问VALSE主页：http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新。

收藏邀请

上一篇：VALSE 2024关于【参会确认和现场签到】通知下一篇：VALSE Webinar 20240424-12期总第346期空天智能感知

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-1 02:32 , Processed in 0.013303 second(s), 14 queries .

返回顶部

VALSE Student Webinar 20240423-01期 总第347期 3DGen:Versatile and Scalable

相关分类

下级分类

VALSE Student Webinar 20240423-01期总第347期 3DGen:Versatile and Scalable