VALSE 论文速览第218期：面向动作质量评价的细粒度时空动作解析 ...

2025-7-1 21:37| 发布者: 程一-计算所| 查看: 73| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自北京科技大学和北京大学的FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment。该工作由论文第一作者徐婧林录制。

论文题目：

FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

作者列表：

徐婧林 (北京科技大学)，尹思博 (北京大学)，赵国豪 (北京大学)，王梓烁 (北京大学)，彭宇新 (北京大学)

B站观看网址：

https://www.bilibili.com/video/BV16joEYPE8X/

论文摘要：

Existing action quality assessment (AQA)methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events.

We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique. Based on this insight, we propose a new fine-grained spatial-temporal action parser named FineParser. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in time and space to minimize the impact of invalid backgrounds during the assessment. In addition, we construct fine-grained annotations of human-centric foreground action masks for the FineDiving dataset, called FineDiving-HM. With refined annotations on diverse target action procedures, FineDiving-HM can promote the development of real-world AQA systems. Through extensive experiments, we demonstrate the effectiveness of FineParser, which outperforms state-of-the-art methods while supporting more tasks of fine-grained action understanding.

参考文献：

[1] Jinglin Xu, Sibo Yin, Guohao Zhao, Zishuo Wang, Yuxin Peng*, FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment, CVPR 2024. Oral (3.3%)

论文链接：

https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_FineParser_A_Fine-grained_Spatio-temporal_Action_Parser_for_Human-centric_Action_Quality_CVPR_2024_paper.pdf

代码链接：

https://github.com/PKU-ICST-MIPL/FinePOSE_CVPR2024

视频讲者简介：

Jinglin Xu is now an Associate Professor in the School of Intelligence Science and Technology at the University of Science and Technology Beijing (USTB), a council member, and deputy secretary-general of the Beijing Society of Image and Graphics (BSIG). Her research interests include computer vision, video understanding, and fine-grained action analysis, where she has authored more than 20 papers in top-tier journals and conference proceedings.

个人主页：

https://xujinglin.github.io/

特别鸣谢本次论文速览主要组织者：

月度轮值AC：于茜 (北京航空航天大学)

收藏邀请

上一篇：VALSE 论文速览第217期：基于时序因果模型的视频事件表征算法 ...下一篇：VALSE Webinar 25-20期总第391期开放场景下多模态信息融合

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-12-1 01:16 , Processed in 0.014017 second(s), 14 queries .

返回顶部

VALSE 论文速览 第218期：面向动作质量评价的细粒度时空动作解析 ...

相关分类

下级分类

VALSE 论文速览第218期：面向动作质量评价的细粒度时空动作解析 ...