VALSE 论文速览第225期：细粒度提示驱动的三维人体姿态估计

2025-9-12 17:13| 发布者: 程一-计算所| 查看: 2465| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自北京科技大学和北京大学的细粒度提示驱动的三维人体姿态估计算法FinePOSE。该工作由论文第一作者徐婧林录制。

论文题目：

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

作者列表：

徐婧林 (北京科技大学)，郭奕杰 (北京大学)，彭宇新 (北京大学)

B站观看网址：

https://www.bilibili.com/video/BV1qzo7YbEkq/

论文摘要：

The 3D Human Pose Estimation (3D HPE) task uses 2D images or videos to predict human joint coordinates in 3D space. Despite recent advancements in deep learning-based methods, they mostly ignore the capability of coupling accessible texts and naturally feasible knowledge of humans, missing out on valuable implicit supervision to guide the 3D HPE task. Moreover, previous efforts often study this task from the perspective of the whole human body, neglecting fine-grained guidance hidden in different body parts. To this end, we present a new Fine-Grained Prompt-Driven Denoiser based on a diffusion model for 3D HPE, named FinePOSE. It consists of three core blocks enhancing the reverse process of the diffusion model: (1) Fine-grained Part-aware Prompt learning (FPP) block constructs fine-grained part-aware prompts via coupling accessible texts and naturally feasible knowledge of body parts with learnable prompts to model implicit guidance. (2) Fine-grained Prompt-pose Communication (FPC) block establishes fine-grained communications between learned part-aware prompts and poses to improve the denoising quality. (3) Prompt-driven Timestamp Stylization (PTS) block integrates learned prompt embedding and temporal information related to the noise level to enable adaptive adjustment at each denoising step. Extensive experiments on public single-human pose estimation datasets show that FinePOSE outperforms state-of-the-art methods. We further extend FinePOSE to multi-human pose estimation. Achieving 34.3mm average MPJPE on the EgoHumans dataset demonstrates the potential of FinePOSE to deal with complex multi-human scenarios.

参考文献：

[1] Jinglin Xu, Yijie Guo, Yuxin Peng, FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models, CVPR 2024. Highlight (11.9%)

论文链接：

https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_FinePOSE_Fine-Grained_Prompt-Driven_3D_Human_Pose_Estimation_via_Diffusion_Models_CVPR_2024_paper.pdf

代码链接：

https://github.com/PKU-ICST-MIPL/FinePOSE_CVPR2024

视频讲者简介：

Jinglin Xu is now an Associate Professor in the School of Intelligence Science and Technology at the University of Science and Technology Beijing (USTB). Before joining USTB, she was a Postdoctoral Fellow at Tsinghua University. She received her Ph.D. degree at Northwestern Polytechnical University. Her research interests are video understanding and fine-grained action analysis. She has authored over 23 papers in top-tier journals and conference proceedings.

个人主页：

https://xujinglin.github.io/

特别鸣谢本次论文速览主要组织者：

月度轮值AC：杨帅 (北京大学)

收藏邀请

上一篇：VALSE 论文速览第224期：弥合低秩适配和正交微调差异的Householder反射适配方法 ... ...下一篇：VALSE Webinar 25-28期总第399期因果与推理通用大模型

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2026-7-27 09:05 , Processed in 0.013823 second(s), 14 queries .

返回顶部

VALSE 论文速览 第225期：细粒度提示驱动的三维人体姿态估计

相关分类

下级分类

VALSE 论文速览第225期：细粒度提示驱动的三维人体姿态估计