[quantization] Introduce wrappers for Qwen3VLTextDecoderLayer and Qwen3VLTextModel by dayo09 · Pull Request #535 · Samsung/TICO

dayo09 · 2026-03-05T06:19:59Z

Let's add wrappers for upper level qwen3vl layers.

TICO-DCO-1.0-Signed-off-by: Dayoung Lee dayoung.lee@samsung.com

…n3VLTextModel - Add `QuantQwen3VLTextDecoderLayer`: wraps attention, MLP, and layernorm blocks; pre-builds static causal mask and RoPE templates to avoid dynamic ops in forward pass - Add `QuantQwen3VLTextModel`: pre-computes shared causal mask and RoPE once and passes them to every decoder layer, so they are quantized exactly once rather than independently in each layer - Register both wrappers in `_CORE_MODULES` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

stamalakhov · 2026-03-05T06:40:34Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+            self._fq(cos, self.obs_cos),
+            self._fq(sin, self.obs_sin),


@dayo09
Sorry for disturbance. But

TICO/tico/quantization/wrapq/wrappers/llama/quant_model.py

Line 212 in ad6de0f

self._fq(cos[:, : hidden_states.size(1), :], self.obs_cos),

will disable dependence on size of inputs. (It proved to be useful for LLama).
It's similar to self.causal_mask_template[..., :seq_len, :seq_len].to(device) above (Ln127).
IMHO.

dayo09 · 2026-03-11T07:15:54Z

tico/quantization/wrapq/examples/qwen/quantize_text_decoder_layer.py

+print(f"│ Mean |diff|: {(q_out - fp_out).abs().mean().item():.6f}")
+print(f"│ PEIR       : {compute_peir(fp_out, q_out) * 100:.6f} %")
+print("└──────────────────────────────────────────────────────")
+print(plot_two_outputs(fp_out, q_out))


┌───────────── Quantization Error Summary ───────────── │ Mean |diff|: 0.071578 │ PEIR : 9.253764 % └────────────────────────────────────────────────────── ┌────────────────────────────────────────────┐ 5.1┤ • │ 3.4┤ • •••• • │ 1.7┤ •••••••••• │ 0.0┤ •••••••••• │ -1.7┤ • •••••• │ -3.4┤ •••••••• │ -5.1┤ • │ └┬──────────┬──────────┬─────────┬──────────┬┘ -5.1 -2.5 0.0 2.5 5.1

dayo09 · 2026-03-11T08:19:03Z

tico/quantization/wrapq/examples/qwen/quantize_text_model.py

+print(f"│ Mean |diff|: {(q_out - fp_out).abs().mean().item():.6f}")
+print(f"│ PEIR       : {compute_peir(fp_out, q_out) * 100:.6f} %")
+print("└──────────────────────────────────────────────────────")
+print(plot_two_outputs(fp_out, q_out))


python3 tico/quantization/wrapq/examples/qwen/quantize_text_model.py ┌───────────── Quantization Error Summary ───────────── │ Mean |diff|: 0.904804 │ PEIR : 351.709125 % └────────────────────────────────────────────────────── ┌──────────────────────────────────────────┐ 28.2┤ │ │ │ │ •• •• • │ 4.7┤ ••••• │ │ ••••• │ │ •••• │ -18.7┤ ••• │ │ │ │ │ -42.2┤ │ │ │ │ │ │ │ -65.6┤ │ │ │ │ │ -89.1┤ │ │ │ │ • │ -112.5┤ │ └┬─────────┬──────────┬─────────┬─────────┬┘ -112.5 -77.4 -42.2 -7.0 28.2

There is one big outlier. 😢

dvsav · 2026-03-18T14:11:20Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+        for m in (self.embed_tokens, self.norm):
+            yield from m._all_observers()
+        for m in self.layers:
+            yield from m._all_observers()


🤔 I'm wondering about the purpose of yielding the observers of submodules' PTQWrappers since they return nothing anyway... (see also #494).

dvsav · 2026-03-18T14:12:37Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_decoder_layer.py

+        yield from self.self_attn._all_observers()
+        yield from self.mlp._all_observers()
+        yield from self.input_layernorm._all_observers()
+        yield from self.post_attention_layernorm._all_observers()


🤔 I'm wondering about the purpose of yielding the observers of submodules' PTQWrappers since they return nothing anyway... (see also #494).

dvsav · 2026-03-18T14:29:15Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+    def get_position_embeddings_for(
+        self, hidden_states: torch.Tensor
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Delegate to the model's actual Qwen3VLTextRotaryEmbedding so that
+        # MRoPE frequencies are split correctly by mrope_section.
+        S = hidden_states.size(1)
+        position_ids = torch.arange(S, device=hidden_states.device).unsqueeze(0)
+        return self.rotary_emb(hidden_states, position_ids)


Note for me

When tico.convert is called, it uses torch.export.export to capture the model as a static computation graph. At this point, the dynamic self.rotary_emb(hidden_states, position_ids) call will be executed with concrete inputs, resulting in static tensor values for cos and sin.

mhs4670go · 2026-03-18T15:13:02Z

@dvsav Thanks for the review! Acutally, @dayo09 will be working in another department for a year. Would you be able to continue working on this PR?

dvsav · 2026-03-18T15:19:31Z

@dvsav Thanks for the review! Acutally, @dayo09 will be working in another department for a year. Would you be able to continue working on this PR?

Hi @mhs4670go
Sure, I'll take over this PR. Thanks for letting me know.

dvsav · 2026-03-24T13:04:33Z

@dvsav Thanks for the review! Acutally, @dayo09 will be working in another department for a year. Would you be able to continue working on this PR?

[quantization] Introduce wrapper for Qwen3VLTextDecoderLayer is ready for review.
[quantization] Introduce wrapper for Qwen3VLTextModel depends on the above PR.

stamalakhov reviewed Mar 5, 2026

View reviewed changes

dayo09 added 2 commits March 11, 2026 16:13

Add examples

4a2e8e4

Improve PEIR by fixing rope and padding

e71a9b1

dayo09 force-pushed the 0303-text-models branch from 4829fa4 to e71a9b1 Compare March 11, 2026 07:15

dayo09 commented Mar 11, 2026

View reviewed changes

Add padding masking test

2c99691

dayo09 commented Mar 11, 2026

View reviewed changes

dvsav mentioned this pull request Mar 13, 2026

Qwen3-VL: Implement quantization wrappers #483

Open

dvsav reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Introduce wrappers for Qwen3VLTextDecoderLayer and Qwen3VLTextModel#535

[quantization] Introduce wrappers for Qwen3VLTextDecoderLayer and Qwen3VLTextModel#535
dayo09 wants to merge 4 commits intoSamsung:mainfrom
dayo09:0303-text-models

dayo09 commented Mar 5, 2026

Uh oh!

stamalakhov Mar 5, 2026

Uh oh!

dayo09 Mar 11, 2026

Uh oh!

dayo09 Mar 11, 2026

Uh oh!

dayo09 Mar 11, 2026

Uh oh!

dvsav Mar 18, 2026

Uh oh!

dvsav Mar 18, 2026

Uh oh!

dvsav Mar 18, 2026

Uh oh!

mhs4670go commented Mar 18, 2026

Uh oh!

dvsav commented Mar 18, 2026 •

edited

Loading

Uh oh!

dvsav commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dayo09 commented Mar 5, 2026

Uh oh!

stamalakhov Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

dayo09 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

dayo09 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

dayo09 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

dvsav Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

dvsav Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

dvsav Mar 18, 2026

Choose a reason for hiding this comment

Note for me

Uh oh!

mhs4670go commented Mar 18, 2026

Uh oh!

dvsav commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvsav commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dvsav commented Mar 18, 2026 •

edited

Loading

dvsav commented Mar 24, 2026 •

edited

Loading