为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自苏黎世联邦理工学院的语义分割方面的工作。该工作由王文冠老师指导,论文第一作者周天飞博士录制。 论文题目:Rethinking Semantic Segmentation: A Prototype View 作者列表:Tianfei Zhou (ETH Zurich),Wenguan Wang (University of Technology Sydney),Ender Konukoglu (ETH Zurich), Luc Van Gool (ETH Zurich) B站观看网址: https://www.bilibili.com/video/BV1qW4y167CK/ 复制链接到浏览器打开或点击阅读原文即可跳转至观看页面。 论文摘要: Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based)and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean features of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters. We empirically show that, with FCN based and attention based segmentation models (i.e., HRNet, Swin, SegFormer)and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compelling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design. 论文信息: [1] Tianfei Zhou, Wenguan Wang, Ender Konukoglu and Luc Van Gool. Rethinking Semantic Segmentation: A Prototype View. CVPR 2022 (Oral) 论文链接: [https://arxiv.org/pdf/2203.15102.pdf] 代码链接: [https://github.com/tfzhou/ProtoSeg] 视频讲者简介: 周天飞,苏黎世联邦理工学院CVL博后研究员,主要研究方向是计算机视觉和深度学习。 特别鸣谢本次论文速览主要组织者: 月度轮值AC:赵文达 (大连理工大学)、任文琦 (中山大学) 季度责任AC:魏秀参 (南京理工大学) 活动参与方式 1、VALSE每周举行的Webinar活动依托B站直播平台进行,欢迎在B站搜索VALSE_Webinar关注我们! 直播地址: https://live.bilibili.com/22300737; 历史视频观看地址: https://space.bilibili.com/562085182/ 2、VALSE Webinar活动通常每周三晚上20:00进行,但偶尔会因为讲者时区问题略有调整,为方便您参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ R群,群号:137634472); *注:申请加入VALSE QQ群时需验证姓名、单位和身份,缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。 3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。 4、您也可以通过访问VALSE主页:http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT(经讲者允许后),会在VALSE官网每期报告通知的最下方更新。 |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2025-1-11 17:01 , Processed in 0.013707 second(s), 14 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.