M-Attack-Website/index.html at main · VILA-Lab/M-Attack-Website · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="x-ua-compatible" content="ie=edge" />
    <title>A Frustratingly Simple Yet Highly Effective Attack
Baseline: Over 90% Success Rate Against the Strong
Black-box Models of GPT-4.5/4o/o1</title>
    <meta name="description" content="" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
    <link rel="stylesheet" href="style/main.css">
    <link rel="stylesheet" href="style/glide.core.min.css">
    <link rel="stylesheet" href="style/glide.theme.min.css">
    <link rel="stylesheet" href="style/glide-custom.css">
  </head>
  <body onload="selectSource('swan'); selectAppVideo('badger'); selectComparisonVideo('parrot'); change_text_promt('gan-teaser', 0)">
    <div class="container" id="container">
      <div class="page page1">
        <h1 style="text-align: center; font-size: 30px;"> </br>
          A Frustratingly Simple Yet Highly Effective Attack
Baseline: Over 90% Success Rate Against the Strong
Black-box Models of GPT-4.5/4o/o1</h1>
        <div class="authors ">
          <div><a href="https://scholar.google.com/citations?user=J-VsziYAAAAJ&hl=en">Zhaoyi Li*</a></div>
          <div><a href="https://scholar.google.com/citations?user=PliLuD4AAAAJ&hl=en">Xiaohan Zhao*</a></div>
          <div><a href="https://scholar.google.com/citations?user=_Vx3dZgAAAAJ&hl=en">Dong-Dong Wu</a></div>
          <div><a href="https://scholar.google.com/citations?user=SI_9kD0AAAAJ&hl=en">Jiacheng Cui</a></div>
          <div><a href="https://scholar.google.com/citations?user=DGr0fVoAAAAJ&hl=en">Zhiqiang Shen<sup>†</sup></a></div>
        </div>
        <p style="text-align: center">Mohamed bin Zayed University of Artificial Intelligence</p>
        <br/>
        <div class="links">
          <a class="btn-solid-lg" href="https://arxiv.org/pdf/2503.10635"><i class="fa fa-file-pdf-o"></i> Paper</a>
          <a class="btn-solid-lg" href="https://github.com/VILA-Lab/M-Attack"><i class="fa fa-github"></i> Code </a>
          <a class="btn-solid-lg" href="https://huggingface.co/datasets/MBZUAI-LLM/M-Attack_AdvSamples"><i class="fa fa-github"></i> Hugging Face </a>
        </div>

        <div class="teaser" style="text-align: center; margin-top: 20px;">
          <img src="figures/real_scenario_page.jpg" alt="teaser image"
               style="max-width: 90%; height: auto; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);">
          <p style="margin-top:1px; font-size: 14px; color: #555; font-style: italic;">
            Figure 1: Example responses from commercial LVLMs to targeted attacks generated by our method.
          </p>
        </div>


    <div class="container justify">
      <!-- <h2>Abstract</h2>
      <p style="text-align: justify;">There has been tremendous progress in large-scale text-to-image synthesis driven by
        diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing
        and generation. We present a generic approach using latent diffusion models as powerful image priors for various
        visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities.
        To improve this, our core ideas are
        <ul>
          <li>a feature matching Loss between features from different layers of the decoder
            to provide detailed guidance</li>
          <li>a KL divergence loss to regularize the predicted latent features and stabilize the
            training.</li>
        </ul>
        We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation,
        and layered image editing. Extensive results show our method compares favorably against baselines.</p> -->
      <h2 style="margin-top: 40px;">Overview of our method</h2>
      <div class="teaser" style="text-align: center; margin-top: 20px;">
        <img src="figures/main_alg.jpg" alt="teaser image"
             style="max-width: 90%; height: auto; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);">
        <p style="margin-top: 4px; font-size: 14px; color: #555; font-style: italic;">
          Figure 2: Illustration of our proposed framework. Our method is based on two components: Local-to-Global or Local-to-Local Matching (LM) and Model Ensemble (ENS). LM is the core of our approach, which helps to refine the local semantics of the perturbation. ENS helps to avoid overly relying on single models embedding similarity, thus improving attack transferability.
        </p>
      </div>


      <h2 style="margin-top: 25px;">Experimental Results</h2>
      <div class="teaser" style="text-align: center; margin-top: 0px;">
        <img src="figures/table1.png" alt="teaser image"
             style="max-width: 90%; height: auto; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);">
        <p style="margin-top: 4px; font-size: 14px; color: #555; font-style: italic;">
          Table 1: Comparison with the state-of-the-art approaches.
        </p>
        <img src="figures/table2.png" alt="teaser image"
             style="max-width: 90%; height: auto; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);">
        <p style="margin-top: 4px; font-size: 14px; color: #555; font-style: italic;">
          Table 2: Ablation study on the impact of ϵ.
        </p>
      </div>


      <h2 style="margin-top: 25px;">Visualization of perturbations and adversarial samples</h2>

      <div class="teaser" style="text-align: center; margin-top: 10px;">
        <img src="figures/Visualize_Perturbation.jpg" alt="teaser image"
             style="max-width: 90%; height: auto; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);">
        <p style="margin-top: 4px; font-size: 14px; color: #555; font-style: italic;">
          Figure 3: Left: visualization of perturbations generated by models with local-to-global matching.
          The ensemble integrates these complementary strengths for high-quality perturbation. Right: visualization of perturbation generated by
          other methods.
        </p>
        <img src="figures/contrast_adv_images.jpg" alt="teaser image"
             style="max-width: 90%; height: auto; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.15);">
        <p style="margin-top: 4px; font-size: 14px; color: #555; font-style: italic;">
          Figure 4: Visualization of adversarial samples generated by different methods.
        </p>
      </div>


      <!-- <p>To apply our proposed method,
        we first obtain the latent code 𝑣 using the differentiable renderer in each application. As illustrated in the above figure,
        to obtain the image that produces 𝑣 with StableDiffusion encoder 𝐸<small>𝜙<small>𝑑𝑒𝑐</small></small> ,
        in (a) Text-to-3D, we render from a NeRF model with a random camera viewpoint; (b) StyleGAN adaptation,
        we generate the image with a pretrained StyleGAN model; (c) Layered image editing application, we use the generator of Text2LIVE to synthesize the edited image,
        alpha map, and the alpha blending of the initial and edited images.</p> -->
    </div>

  <div class="container">
  <h2>BibTeX</h2>
  <pre><code>
    @article{li2025frustratingly,
      title={A frustratingly simple yet highly effective attack baseline: Over 90\% success rate against the strong black-box models of gpt-4.5/4o/o1},
      author={Li, Zhaoyi and Zhao, Xiaohan and Wu, Dong-Dong and Cui, Jiacheng and Shen, Zhiqiang},
      journal={arXiv preprint arXiv:2503.10635},
      year={2025}
    }
  </code></pre>


  </div>
  </body>
</html>