VALSE 首页 活动通知 查看内容

VALSE 论文速览 第163期:Attention-Driven Masked Image Modeling

2024-1-26 19:38| 发布者: 程一-计算所| 查看: 673| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展,VALSE最新推出了《论文速览》栏目,将在每周发布一至两篇顶会顶刊论文的录制视频,对单个前沿工作进行细致讲解。本期VALSE论文速 ...



Good Helper Is around You: Attention-Driven Masked Image Modeling


刘政岐 (东南大学),桂杰 (东南大学),罗浩 (阿里巴巴)



It has been witnessed that masked image modeling (MIM) has shown a huge potential in self-supervised learning in the past year. Benefiting from the universal backbone vision transformer, MIM learns self-supervised visual representations through masking a part of patches of the image while attempting to recover the missing pixels. Most previous works mask patches of the image randomly, which underutilizes the semantic information that is beneficial to visual representation learning. On the other hand, due to the large size of the backbone, most previous works have to spend much time on pretraining. In this paper, we propose Attention-driven Masking and Throwing Strategy (AMT), which could solve both problems above. We first leverage the self-attention mechanism to obtain the semantic information of the image during the training process automatically without using any supervised methods. Masking strategy can be guided by that information to mask areas selectively, which is helpful for representation learning. Moreover, a redundant patch throwing strategy is proposed, which makes learning more efficient. As a plug-and-play module for masked image modeling, AMT improves the linear probing accuracy of MAE by 2.9% ∼ 5.9% on CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-1K, and obtains an improved performance with respect to fine-tuning accuracy of MAE and SimMIM. Moreover, this design also achieves superior performance on downstream detection and segmentation tasks.


[1] Z. Liu, J. Gui, and H. Luo, “Good Helper Is around You: Attention-Driven Masked Image Modeling”, AAAI, vol. 37, no. 2, pp. 1799-1807, Jun. 2023.








月度轮值AC:张力 (复旦大学)

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2024-5-27 01:44 , Processed in 0.014863 second(s), 14 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.