VALSE 论文速览第164期：面向视觉语言预训练模型的集合引导攻击 ...

2024-1-31 19:39| 发布者: 程一-计算所| 查看: 1454| 评论: 0

摘要: 为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速 ...

为了使得视觉与学习领域相关从业者快速及时地了解领域的最新发展动态和前沿技术进展，VALSE最新推出了《论文速览》栏目，将在每周发布一至两篇顶会顶刊论文的录制视频，对单个前沿工作进行细致讲解。本期VALSE论文速览选取了来自南方科技大学的视觉语言预训练模型中的迁移对抗攻击的工作。该工作由南方科技大学博士生陆东录制。

论文题目：

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

作者列表：

陆东* (南方科技大学)，王志强* (南方科技大学)，王腾 (香港大学，南方科技大学)，关惟俐 (莫纳什大学)，高宏昌 (天普大学)，郑锋 (南方科技大学，鹏程实验室)

B站观看网址：

https://www.bilibili.com/video/BV1iA4m1V7TB/

论文摘要：

Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks. Furthermore, malicious adversaries can be deliberately transferred to attack other black-box models. However, existing work has mainly focused on investigating white-box attacks. In this paper, we present the first study to investigate the adversarial transferability of recent VLP models. We observe that existing methods exhibit much lower transferability, compared to the strong attack performance in white-box settings. The transferability degradation is partly caused by the under-utilization of cross-modal interactions. Particularly, unlike unimodal learning, VLP models rely heavily on cross-modal interactions and the multimodal alignments are many-to-many, e.g., an image can be described in various natural languages. To this end, we propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance. Experimental results demonstrate that SGA could generate adversarial examples that can strongly transfer across different VLP models on multiple downstream vision-language tasks. On image-text retrieval, SGA significantly enhances the attack success rate for transfer attacks from ALBEF to TCL by a large margin (at least 9.78% and up to 30.21%), compared to the state-of-the-art.

论文链接：

[https://arxiv.org/abs/2307.14061]

代码链接：

[https://github.com/Zoky-2020/SGA]

视频讲者简介：

陆东，南方科技大学博士生，研究方向主要为Trustworthy ML。

特别鸣谢本次论文速览主要组织者：

月度轮值AC：张瑞茂 (香港中文大学 (深圳))

收藏邀请

上一篇：VALSE 论文速览第163期：Attention-Driven Masked Image Modeling下一篇：VALSE 论文速览第165期：SMP: Single-stage Multi-Human Parsing

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-5 03:10 , Processed in 0.013620 second(s), 14 queries .

返回顶部

VALSE 论文速览 第164期：面向视觉语言预训练模型的集合引导攻击 ...

相关分类

下级分类

VALSE 论文速览第164期：面向视觉语言预训练模型的集合引导攻击 ...