VALSE 首页 活动通知 查看内容

VALSE 论文速览 第100期:基于DETR框架和 Hard-Positive Query挖掘的人与物体交互检测 ...

2022-11-18 17:17| 发布者: 程一-计算所| 查看: 53| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。本期VALSE论文速 ...


论文题目:Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

作者列表:钟旭彬 (华南理工大学)、*丁⻓兴 (华南理工大学)、黎子建 (华南理工大学)、⻩少立 (腾讯AI Lab)



Human-Object Interaction (HOI)detection is a core task for high-level image understanding. Recently, Detection Transformer (DETR)- based HOI detectors have become popular due to their superior performance and efficient structure. However, these approaches typically adopt fixed HOI queries for all testing images, which is vulnerable to the location change of objects in one specific image. Accordingly, in this paper, we propose to enhance DETR's robustness by mining hard-positive queries, which are forced to make correct predictions using partial visual cues. First, we explicitly compose hard-positive queries according to the ground-truth (GT)position of labeled human-object pairs for each training image. Specifically, we shift the GT bounding boxes of each labeled human-object pair so that the shifted boxes cover only a certain portion of the GT ones. We encode the coordinates of the shifted boxes for each labeled human-object pair into an HOI query. Second, we implicitly construct another set of hard-positive queries by masking the top scores in cross-attention maps of the decoder layers. The masked attention maps then only cover partial important cues for HOI predictions. Finally, an alternate strategy is proposed that efficiently combines both types of hard queries. In each iteration, both DETR's learnable queries and one selected type of hard- positive queries are adopted for loss computation. Experimental results show that our proposed approach can be widely applied to existing DETR-based HOI detectors. Moreover, we consistently achieve state-of-the-art performance on three benchmarks: HICO- DET, V-COCO, and HOI-A.


[1] X. Zhong, C. Ding, Z. Li, and S. Huang. Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection. In ECCV, 2022.






钟旭彬,华南理工大学博士生。主要研究方向包括视觉关系检测,人与物体交互检测 等。在IJCV、CVPR和ECCV等顶级期刊和会议发表多篇论文。


月度轮值AC:王立君 (大连理工大学)、眭亚楠 (清华大学)

季度责任AC:张姗姗 (南京理工大学)





2、VALSE Webinar活动通常每周三晚上20:00进行,但偶尔会因为讲者时区问题略有调整,为方便您参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ R群,群号:137634472);

*注:申请加入VALSE QQ群时需验证姓名、单位和身份缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。


4您也可以通过访问VALSE主页: 直接查看Webinar活动信息。Webinar报告的PPT(经讲者允许后),会在VALSE官网每期报告通知的最下方更新。

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2022-12-1 03:33 , Processed in 0.011806 second(s), 14 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.