pthombre
diff --git a/‎.ai/review-rules.md‎
Lines changed: 11 additions & 0 deletions b/‎.ai/review-rules.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎.github/workflows/claude_review.yml‎
Lines changed: 42 additions & 0 deletions b/‎.github/workflows/claude_review.yml‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎docs/source/en/_toctree.yml‎
Lines changed: 10 additions & 0 deletions b/‎docs/source/en/_toctree.yml‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/source/en/api/models/autoencoder_kl_kvae.md‎
Lines changed: 32 additions & 0 deletions b/‎docs/source/en/api/models/autoencoder_kl_kvae.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎docs/source/en/api/models/autoencoder_kl_kvae_video.md‎
Lines changed: 33 additions & 0 deletions b/‎docs/source/en/api/models/autoencoder_kl_kvae_video.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/cogvideox.md‎
Lines changed: 3 additions & 4 deletions b/‎docs/source/en/api/pipelines/cogvideox.md‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎docs/source/en/api/pipelines/llada2.md‎
Lines changed: 90 additions & 0 deletions b/‎docs/source/en/api/pipelines/llada2.md‎
Lines changed: 90 additions & 0 deletions
@@ -0,0 +1,11 @@
+# PR Review Rules
+
+Review-specific rules for Claude. Focus on correctness — style is handled by ruff.
+
+Before reviewing, read and apply the guidelines in:
+- [AGENTS.md](AGENTS.md) — coding style, dependencies, copied code, model conventions
+- [skills/model-integration/SKILL.md](skills/model-integration/SKILL.md) — attention pattern, pipeline rules, implementation checklist, gotchas
+- [skills/parity-testing/SKILL.md](skills/parity-testing/SKILL.md) — testing rules, comparison utilities
+- [skills/parity-testing/pitfalls.md](skills/parity-testing/pitfalls.md) — known pitfalls (dtype mismatches, config assumptions, etc.)
+
+## Common mistakes (add new rules below this line)
@@ -0,0 +1,42 @@
+name: Claude PR Review
+
+on:
+  issue_comment:
+    types: [created]
+  pull_request_review_comment:
+    types: [created]
+
+permissions:
+  contents: write
+  pull-requests: write
+  issues: read
+  id-token: write
+
+jobs:
+  claude-review:
+    if: |
+      (
+        github.event_name == 'issue_comment' &&
+        github.event.issue.pull_request &&
+        github.event.issue.state == 'open' &&
+        contains(github.event.comment.body, '@claude') &&
+        (github.event.comment.author_association == 'MEMBER' ||
+         github.event.comment.author_association == 'OWNER' ||
+         github.event.comment.author_association == 'COLLABORATOR')
+      ) || (
+        github.event_name == 'pull_request_review_comment' &&
+        contains(github.event.comment.body, '@claude') &&
+        (github.event.comment.author_association == 'MEMBER' ||
+         github.event.comment.author_association == 'OWNER' ||
+         github.event.comment.author_association == 'COLLABORATOR')
+      )
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 1
+      - uses: anthropics/claude-code-action@v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          claude_args: |
+            --append-system-prompt "Review this PR against the rules in .ai/review-rules.md. Focus on correctness, not style (ruff handles style). Only review changes under src/diffusers/. Do NOT commit changes unless the comment explicitly asks you to using the phrase 'commit this'."
@@ -448,6 +448,10 @@
         title: AutoencoderKLHunyuanVideo
       - local: api/models/autoencoder_kl_hunyuan_video15
         title: AutoencoderKLHunyuanVideo15
+      - local: api/models/autoencoder_kl_kvae
+        title: AutoencoderKLKVAE
+      - local: api/models/autoencoder_kl_kvae_video
+        title: AutoencoderKLKVAEVideo
       - local: api/models/autoencoderkl_audio_ltx_2
         title: AutoencoderKLLTX2Audio
       - local: api/models/autoencoderkl_ltx_2
@@ -668,6 +672,10 @@
       - local: api/pipelines/z_image
         title: Z-Image
       title: Image
+    - sections:
+      - local: api/pipelines/llada2
+        title: LLaDA2
+      title: Text
     - sections:
       - local: api/pipelines/allegro
         title: Allegro
@@ -716,6 +724,8 @@
   - sections:
     - local: api/schedulers/overview
       title: Overview
+    - local: api/schedulers/block_refinement
+      title: BlockRefinementScheduler
     - local: api/schedulers/cm_stochastic_iterative
       title: CMStochasticIterativeScheduler
     - local: api/schedulers/ddim_cogvideox
 
@@ -0,0 +1,32 @@
+<!-- Copyright 2025 The Kandinsky Team and The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. -->
+
+# AutoencoderKLKVAE
+
+The 2D variational autoencoder (VAE) model with KL loss.
+
+The model can be loaded with the following code snippet.
+
+```python
+import torch
+from diffusers import AutoencoderKLKVAE
+
+vae = AutoencoderKLKVAE.from_pretrained("kandinskylab/KVAE-2D-1.0", subfolder="diffusers", torch_dtype=torch.bfloat16)
+```
+
+## AutoencoderKLKVAE
+
+[[autodoc]] AutoencoderKLKVAE
+  - decode
+  - all
@@ -0,0 +1,33 @@
+<!-- Copyright 2025 The Kandinsky Team and The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. -->
+
+# AutoencoderKLKVAEVideo
+
+The 3D variational autoencoder (VAE) model with KL loss.
+
+The model can be loaded with the following code snippet.
+
+```python
+import torch
+from diffusers import AutoencoderKLKVAEVideo
+
+vae = AutoencoderKLKVAEVideo.from_pretrained("kandinskylab/KVAE-3D-1.0", subfolder="diffusers", torch_dtype=torch.float16)
+```
+
+## AutoencoderKLKVAEVideo
+
+[[autodoc]] AutoencoderKLKVAEVideo
+  - decode
+  - all
+
@@ -41,16 +41,15 @@ The quantized CogVideoX 5B model below requires ~16GB of VRAM.
 
 ```py
 import torch
-from diffusers import CogVideoXPipeline, AutoModel
+from diffusers import CogVideoXPipeline, AutoModel, TorchAoConfig
 from diffusers.quantizers import PipelineQuantizationConfig
 from diffusers.hooks import apply_group_offloading
 from diffusers.utils import export_to_video
+from torchao.quantization import Int8WeightOnlyConfig
 
 # quantize weights to int8 with torchao
 pipeline_quant_config = PipelineQuantizationConfig(
-  quant_backend="torchao",
-  quant_kwargs={"quant_type": "int8wo"},
-  components_to_quantize="transformer"
+  quant_mapping={"transformer": TorchAoConfig(Int8WeightOnlyConfig())}
 )
 
 # fp8 layerwise weight-casting
 
@@ -0,0 +1,90 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LLaDA2
+
+[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) is a family of discrete diffusion language models
+that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation,
+LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement
+steps.
+
+## Usage
+
+```py
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+from diffusers import BlockRefinementScheduler, LLaDA2Pipeline
+
+model_id = "inclusionAI/LLaDA2.1-mini"
+model = AutoModelForCausalLM.from_pretrained(
+    model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+scheduler = BlockRefinementScheduler()
+
+pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer)
+output = pipe(
+    prompt="Write a short poem about the ocean.",
+    gen_length=256,
+    block_length=32,
+    num_inference_steps=32,
+    threshold=0.7,
+    editing_threshold=0.5,
+    max_post_steps=16,
+    temperature=0.0,
+)
+print(output.texts[0])
+```
+
+## Callbacks
+
+Callbacks run after each refinement step. Pass `callback_on_step_end_tensor_inputs` to select which tensors are
+included in `callback_kwargs`. In the current implementation, `block_x` (the sequence window being refined) and
+`transfer_index` (mask-filling commit mask) are provided; return `{"block_x": ...}` from the callback to replace the
+window.
+
+```py
+def on_step_end(pipe, step, timestep, callback_kwargs):
+    block_x = callback_kwargs["block_x"]
+    # Inspect or modify `block_x` here.
+    return {"block_x": block_x}
+
+out = pipe(
+    prompt="Write a short poem.",
+    callback_on_step_end=on_step_end,
+    callback_on_step_end_tensor_inputs=["block_x"],
+)
+```
+
+## Recommended parameters
+
+LLaDA2.1 models support two modes:
+
+| Mode | `threshold` | `editing_threshold` | `max_post_steps` |
+|------|-------------|---------------------|------------------|
+| Quality | 0.7 | 0.5 | 16 |
+| Speed | 0.5 | `None` | 16 |
+
+Pass `editing_threshold=None`, `0.0`, or a negative value to turn off post-mask editing.
+
+For LLaDA2.0 models, disable editing by passing `editing_threshold=None` or `0.0`.
+
+For all models: `block_length=32`, `temperature=0.0`, `num_inference_steps=32`.
+
+## LLaDA2Pipeline
+[[autodoc]] LLaDA2Pipeline
+    - all
+    - __call__
+
+## LLaDA2PipelineOutput
+[[autodoc]] pipelines.LLaDA2PipelineOutput