VALSE 论文速览第217期：基于时序因果模型的视频事件表征算法 ...

2025-7-1 21:00| 发布者: 程一-计算所| 查看: 90| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自山东大学的视频理解的工作。该工作由孟雷教授指导，论文一作王雨情同学录制。

论文题目：

Modeling Event-level Causal Representation for Video Classification

作者列表：

王雨情 (山东大学)、孟雷 (山东大学)、马浩凯 (山东大学)、黄海北 (浪潮)、孟祥旭 (山东大学)

B站观看网址：

https://www.bilibili.com/video/BV1z7oEYYEmi/

论文摘要：

Classifying videos differs from that of images in the need to capture the information on what has happened, instead of what is in the frames. Conventional methods typically follow the data-driven approach, which uses transformer-based attention models to extract and aggregate the features of video frames as the representation of the entire video. However, this approach tends to extract the object information of frames and may face difficulties in classifying the classes talking about events, such as "fixing bicycle". To address this issue, This paper presents an Event-level Causal Representation Learning (ECRL) model for the spatio-temporal modeling of both the in-frame object interactions and their cross-frame temporal correlations. Specifically, ECRL first employs a Frame-to-Video Causal Modeling (F2VCM) module, which simultaneously builds the in-frame causal graph with the background and foreground information and models their cross-frame correlations to construct a video-level causal graph. Subsequently, a Causality-aware Event-level Representation Inference (CERI) module is introduced to eliminate the spurious correlations in contexts and objects via the back- and front-door interventions, respectively. The former involves visual context de-biasing to filter out background confounders, while the latter employs global-local causal attention to capture event-level visual information. Experimental results on two benchmarking datasets verified that ECRL may better capture the cross-frame correlations to describe videos in event-level features.

参考文献：

[1] Ehsan Abbasnejad, Damien Teney, Amin Parvaneh, et al. 2020. Counterfactual vision and language learning. In CVPR. 10044--10054.

[2] Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, et al. 2021. Vivit: A video vision transformer. In CVPR. 6836--6846.

[3] Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Scalable k-means. arXiv preprint arXiv:1203.6402 (2012).

[4] Elias Bareinboim and Judea Pearl. 2012. Controlling selection bias in causal inference. In Artificial Intelligence and Statistics. 100--108.

[5] Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In ICML, Vol. 2. 4.

[6] Michel Besserve, Arash Mehrjou, Rémy Sun, and Bernhard Schölkopf. 2018. Counterfactuals uncover the modular structure of deep generative models. arXiv preprint arXiv:1812.03253 (2018).

论文链接：

https://dl.acm.org/doi/abs/10.1145/3664647.3681547

代码链接：

https://github.com/wyqcrystal/ECRL.

视频讲者简介：

Wang Yuqing is currently a master's student at the School of Software, Shandong University. During his graduate studies, he took charge of large-scale multimedia data processing and algorithm research related to social governance events and facilitated the practical implementation of a digital twin platform for digitalized social governance. He has published two conference papers as the first author, classified as CCF-A and CCF-C, respectively, and has filed a patent under the Tencent Rhino-Bird Innovation Fund. During his master's program, he was awarded the Third Prize for Freshmen Scholarship and the First-Class Academic Scholarship. In terms of competitions, he demonstrated outstanding performance, winning the First Prize in the International Mathematical Contest in Modeling (MCM), the Second Prize in the CCF Outstanding Undergraduate Academic Showcase, and the National Third Prize in the 14th China College Students' Innovation and Entrepreneurship Outsourcing Competition.

个人主页：

https://scholar.google.com.hk/citations?view_op=list_works&hl=zh-CN&user=2MA5TZcAAAAJ

特别鸣谢本次论文速览主要组织者：

月度轮值AC：王一帆 (大连理工大学)

收藏邀请

上一篇：VALSE 论文速览第216期：面向动态大场景的精准动作捕捉下一篇：VALSE 论文速览第218期：面向动作质量评价的细粒度时空动作解析 ...

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-12-1 01:59 , Processed in 0.015128 second(s), 14 queries .

返回顶部

VALSE 论文速览 第217期：基于时序因果模型的视频事件表征算法 ...

相关分类

下级分类

VALSE 论文速览第217期：基于时序因果模型的视频事件表征算法 ...