VALSE Webinar 20240320-07期总第341期鲁棒开放世界感知

2024-3-14 21:01| 发布者: 程一-计算所| 查看: 1919| 评论: 0

摘要: 报告嘉宾：杨丽鹤 (香港大学)报告题目：Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data报告嘉宾：李祥泰 (昆仑万维)报告题目：Beyond SAM: Towards More Efficient, Unified, General View of ...

报告嘉宾：杨丽鹤 (香港大学)

报告题目：Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

报告嘉宾：李祥泰 (昆仑万维)

报告题目：Beyond SAM: Towards More Efficient, Unified, General View of SAM

报告嘉宾：杨丽鹤 (香港大学)

报告时间：2024年3月20日 (星期三)晚上20:00 (北京时间)

报告题目：Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

报告人简介：

Lihe Yang, a first-year PhD student at The University of Hong Kong. Currently, his primary research focuses on developing open-world robust and efficient visual perception algorithms. As the first author, he has published six papers in top-tier international conferences in computer vision (CVPR, ICCV, NeurIPS), including one oral presentation. His research works have garnered 6k+ stars on GitHub. He has won first or second place in four visual algorithm competitions. He has also been awarded the HKU Presidential Scholarship, National Scholarship, NeurIPS 2023 Outstanding Reviewer, etc.

杨丽鹤，香港大学一年级博士生。目前主要研究方向是开放世界鲁棒高效的视觉感知算法。以第一作者在计算机视觉国际顶级会议 (CVPR、ICCV、NeurIPS) 上发表六篇论文，包括一篇口头报告。科研开源项目在GitHub累计获得6k+ stars。获得四项视觉算法竞赛的冠军或亚军。获得香港大学校长奖学金、国家奖学金、NeurIPS 2023 Outstanding Reviewer等。

个人主页：

https://liheyoung.github.io

报告摘要：

Depth information is a crucial aspect of visual data with diverse applications, spanning from 3D reconstruction and image/video manipulation to SLAM, deblurring, and monocular 3D perception. Recovering precise depth information from a single image (i.e., monocular depth estimation) is a challenging yet appealing task. In this talk, I will present our latest work Depth Anything, focusing on establishing a foundational model for monocular depth estimation from a data-centric perspective. Jointly trained on 1.5M labeled images and 62M unlabeled images, Depth Anything is capable of producing accurate depth maps for various unseen images, showcasing robust generalization abilities. Moreover, it can serve as a powerful pre-trained encoder for downstream middle-level and high-level perception tasks. Additionally, we upgrade a better depth-to-image ControlNet based on Depth Anything, resulting in more compelling outcomes in image generation and video editing.

参考文献：

[1] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, CVPR 2024

报告嘉宾：李祥泰 (昆仑万维)

报告时间：2024年3月20日 (星期三)晚上20:30 (北京时间)

报告题目：Beyond SAM: Towards More Efficient, Unified, General View of SAM

报告人简介：

Dr. Xiangtai Li works at research scientist in Skywork AI. He was a postdoctoral researcher at MMLab, Nanyang Technological University (NTU). He received his Ph.D. in 2022 from the School of Intelligent Systems at Peking University and was honored as an outstanding graduate at both the university and city (Beijing) levels. His main research areas include image segmentation and detection, multimodal learning, and video understanding, with a focus on enabling intelligent machines to truly comprehend various complex scene inputs. He has published over 40 papers in top international conferences in computer vision (such as CVPR, ICCV, ECCV, ICLR, NeurIPS, etc.) and journals (such as TPAMI, IJCV, TIP, etc.). During his doctoral studies, he interned as a research intern in several companies (SenseTime, JD), won the Peking University President Scholarship, the National Scholarship, and some of his research achievements were applied to the products of his internship companies.

李祥泰，目前在昆仑万维担任算法研究员，曾在南洋理工大学担任博士后研究员 (MMLab@NTU)。2022年博士毕业于北大智能学院，是北大校级与北京市优秀毕业生。主要研究方向包括: 图像分割与检测、多模态学习和视频理解，专注于让智能机器真正理解各种复杂的场景输入。在计算机视觉国际顶级会议 (CVPR、ICCV、ECCV、ICLR、NeurIPS等)和期刊 (TPAMI、IJCV、TIP等)上发表40余篇论文，其中第一作者身份18篇。读博期间在商汤京东等多家企业做科研实习生，曾获得北大校长奖学金、国家奖学金，海英之星奖学金，部分研究成果应用到了实习单位的产品中。

个人主页：

https://lxtgh.github.io

报告摘要：

SAM is one representative vision foundation from Meta research. It supports general interactive segmentation, which results in wide range applications, including image matting, interactive inpainting. However, SAM itself have several shortcomings: efficiency, lacking semantics, high resolution handling, temporal association, and more task unification. In this talk, I will present four works that I have deeply involved and leaded in the last year. The first work is Edge-SAM: the first SAM model that can run on iPhone for real-time interactive segmentation. The second work is Open-Vocabulary SAM: augment SAM with CLIP knowledge in one model. The third work is BA-SAM: a SAM-variation that specifically handle the high-resolution images and salient objects. The four work is OMG-Seg: the first work that using one model to achieve image, video, open-vocabulary, multi-dataset, and SAM-like segmentation in one shot. I will further summary and discuss the closely related works and future direction in this talk.

参考文献：

[1] EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM, arXiv 2023

[2] Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively, arXiv 2024

[3] BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model, CVPR 2024

[4] OMG-Seg: Is One Model Good Enough For All Segmentation? CVPR 2024

[5] RAP-SAM: Towards Real-Time All-Purpose Segment Anything, arXiv 2024

主持人：赵恒爽 (香港大学)

主持人简介：

Dr. Hengshuang Zhao is an Assistant Professor in the Department of Computer Science at The University of Hong Kong. Before that, he was a postdoctoral researcher at Massachusetts Institute of Technology and University of Oxford. He obtained his Ph.D. degree from The Chinese University of Hong Kong. His general research interests cover the broad area of computer vision, machine learning, and artificial intelligence, with a special emphasis on building intelligent visual systems. He and his team won several championships in competitive international challenges like the ImageNet Scene Parsing Challenge. He received the rising star award at the world artificial intelligence conference and was recognized as one of the most influential scholars in computer vision by AI 2000. He has served as an Area Chair for CVPR, ECCV, NeurIPS, and AAAI. His research works have been cited about 26,000+ times.

赵恒爽博士是香港大学计算机科学系助理教授。此前，他曾在麻省理工学院和牛津大学担任博士后研究员，博士毕业于香港中文大学。他的研究兴趣涵盖计算机视觉、机器学习和人工智能等广泛领域，特别着重于构建智能视觉系统。他和他的团队获得过多次国际学术竞赛的冠军，包括ImageNet场景解析竞赛等。他获得了世界人工智能大会的新星奖，并被 AI 2000 评为计算机视觉领域最具影响力的学者之一。他曾担任CVPR、ECCV、NeurIPS和 AAAI的领域主席，研究工作被引用超26,000次。

个人主页：

https://www.cs.hku.hk/~hszhao

特别鸣谢本次Webinar主要组织者：

主办AC：赵恒爽 (香港大学)

收藏邀请

上一篇：VALSE 论文速览第165期：SMP: Single-stage Multi-Human Parsing下一篇：VALSE Webinar 20240327-08期总第342期多模态大模型的前身与今世

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-5 07:32 , Processed in 0.014832 second(s), 14 queries .

返回顶部

VALSE Webinar 20240320-07期 总第341期 鲁棒开放世界感知

相关分类

下级分类

VALSE Webinar 20240320-07期总第341期鲁棒开放世界感知