为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。 论文题目: Bridge the Points: Graph-based Few-shot Segment Anything Semantically 作者列表: Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei B站观看网址: 论文摘要: The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i.
参考文献: [1] Zhang, Anqi, et al. "Bridge the Points: Graph-based Few-shot Segment Anything Semantically." NeurIPS (2024). 论文链接: https://arxiv.org/abs/2410.06964 视频讲者简介: 张安琪是一名北京理工大学的硕士研究生,现阶段在高广宇副教授的团队进行研究。当前研究方向为计算机视觉中的语义分割、小样本学习及增量学习。曾在计算机视觉顶会ECCV和机器学习顶会NeurIPS发表工作。
个人主页: https://scholar.google.com/citations?user=DmRZv5sAAAAJ 特别鸣谢本次论文速览主要组织者: 月度轮值AC:于茜 (北京航空航天大学) |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2025-10-14 13:30 , Processed in 0.014198 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.