VALSE 论文速览第163期：Attention-Driven Masked Image Modeling

2024-1-26 19:38| 发布者: 程一-计算所| 查看: 1261| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自东南大学的自监督学习的工作。该工作由桂杰教授指导，论文一作刘政岐同学录制。

论文题目：

Good Helper Is around You: Attention-Driven Masked Image Modeling

作者列表：

刘政岐 (东南大学)，桂杰 (东南大学)，罗浩 (阿里巴巴)

B站观看网址：

https://www.bilibili.com/video/BV1ee411Y7xX/

论文摘要：

It has been witnessed that masked image modeling (MIM) has shown a huge potential in self-supervised learning in the past year. Benefiting from the universal backbone vision transformer, MIM learns self-supervised visual representations through masking a part of patches of the image while attempting to recover the missing pixels. Most previous works mask patches of the image randomly, which underutilizes the semantic information that is beneficial to visual representation learning. On the other hand, due to the large size of the backbone, most previous works have to spend much time on pretraining. In this paper, we propose Attention-driven Masking and Throwing Strategy (AMT), which could solve both problems above. We first leverage the self-attention mechanism to obtain the semantic information of the image during the training process automatically without using any supervised methods. Masking strategy can be guided by that information to mask areas selectively, which is helpful for representation learning. Moreover, a redundant patch throwing strategy is proposed, which makes learning more efficient. As a plug-and-play module for masked image modeling, AMT improves the linear probing accuracy of MAE by 2.9% ∼ 5.9% on CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-1K, and obtains an improved performance with respect to fine-tuning accuracy of MAE and SimMIM. Moreover, this design also achieves superior performance on downstream detection and segmentation tasks.

参考文献：

[1] Z. Liu, J. Gui, and H. Luo, “Good Helper Is around You: Attention-Driven Masked Image Modeling”, AAAI, vol. 37, no. 2, pp. 1799-1807, Jun. 2023.

论文链接：

[https://ojs.aaai.org/index.php/AAAI/article/view/25269]

代码链接：

[https://github.com/guijiejie/AMT]

视频讲者简介：

刘政岐，东南大学在读硕士，研究方向为自监督学习，导师是桂杰教授。

特别鸣谢本次论文速览主要组织者：

月度轮值AC：张力 (复旦大学)

收藏邀请

上一篇：VALSE 论文速览第162期：StylerDALLE: 基于预训练图像生成模型的语言引导风格迁移 ...下一篇：VALSE 论文速览第164期：面向视觉语言预训练模型的集合引导攻击 ...

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-5 00:23 , Processed in 0.013291 second(s), 14 queries .

返回顶部

VALSE 论文速览 第163期：Attention-Driven Masked Image Modeling

相关分类

下级分类

VALSE 论文速览第163期：Attention-Driven Masked Image Modeling