VALSE 论文速览第215期：360全景多模态场景理解

2025-6-30 00:16| 发布者: 程一-计算所| 查看: 87| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自英国伯明翰大学MIx组关于360度全景多模态场景理解的工作。该工作由焦剑波指导，论文一作陈皓同学录制。基于该数据集，我们在ICCV 2025举办BinEgo360 Workshop&Challenge，欢迎大家参加：https://x360dataset.github.io/BinEgo-360/.

论文题目：

360+x : A Panoptic Multi-modal Scene Understanding Dataset

作者列表：

Hao Chen (Machine Intelligence + x Group, University of Birmingham)、Yuqi Hou (Machine Intelligence + x Group, University of Birmingham)、Chenyuan Qu (Machine Intelligence + x Group, University of Birmingham)、Irene Testini (Machine Intelligence + x Group, University of Birmingham)、Xiaohan Hong (Machine Intelligence + x Group, University of Birmingham)、Jianbo Jiao (Machine Intelligence + x Group, University of Birmingham)

B站观看网址：

https://www.bilibili.com/video/BV1EJoEYwEb3/

论文摘要：

Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives.

论文链接：

https://x360dataset.github.io/static/pdfs/CVPR2024_360x__A_Dataset_for_Panoptic_Multi_modal_Scene_Understanding.pdf

代码链接：

https://github.com/x360dataset/x360dataset-kit

视频讲者简介：

陈皓，剑桥大学博士生，伯明翰大学MIx Lab研究助理。科研方向为视频理解，特征学习和模型泛化。

特别鸣谢本次论文速览主要组织者：

月度轮值AC：汪婧雅 (上海科技大学)

收藏邀请

上一篇：VALSE Webinar 25-19期总第390期复杂多模态信息处理下一篇：VALSE 论文速览第216期：面向动态大场景的精准动作捕捉

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-12-1 03:17 , Processed in 0.013197 second(s), 14 queries .

返回顶部

VALSE 论文速览 第215期：360全景多模态场景理解

相关分类

下级分类

VALSE 论文速览第215期：360全景多模态场景理解