Skip to content

Commit 8f88332

Browse files
authored
Merge branch 'main' into automodel_docs
2 parents ec60cdf + 7da22b9 commit 8f88332

File tree

66 files changed

+6571
-885
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+6571
-885
lines changed

.ai/review-rules.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# PR Review Rules
2+
3+
Review-specific rules for Claude. Focus on correctness — style is handled by ruff.
4+
5+
Before reviewing, read and apply the guidelines in:
6+
- [AGENTS.md](AGENTS.md) — coding style, dependencies, copied code, model conventions
7+
- [skills/model-integration/SKILL.md](skills/model-integration/SKILL.md) — attention pattern, pipeline rules, implementation checklist, gotchas
8+
- [skills/parity-testing/SKILL.md](skills/parity-testing/SKILL.md) — testing rules, comparison utilities
9+
- [skills/parity-testing/pitfalls.md](skills/parity-testing/pitfalls.md) — known pitfalls (dtype mismatches, config assumptions, etc.)
10+
11+
## Common mistakes (add new rules below this line)
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Claude PR Review
2+
3+
on:
4+
issue_comment:
5+
types: [created]
6+
pull_request_review_comment:
7+
types: [created]
8+
9+
permissions:
10+
contents: write
11+
pull-requests: write
12+
issues: read
13+
id-token: write
14+
15+
jobs:
16+
claude-review:
17+
if: |
18+
(
19+
github.event_name == 'issue_comment' &&
20+
github.event.issue.pull_request &&
21+
github.event.issue.state == 'open' &&
22+
contains(github.event.comment.body, '@claude') &&
23+
(github.event.comment.author_association == 'MEMBER' ||
24+
github.event.comment.author_association == 'OWNER' ||
25+
github.event.comment.author_association == 'COLLABORATOR')
26+
) || (
27+
github.event_name == 'pull_request_review_comment' &&
28+
contains(github.event.comment.body, '@claude') &&
29+
(github.event.comment.author_association == 'MEMBER' ||
30+
github.event.comment.author_association == 'OWNER' ||
31+
github.event.comment.author_association == 'COLLABORATOR')
32+
)
33+
runs-on: ubuntu-latest
34+
steps:
35+
- uses: actions/checkout@v4
36+
with:
37+
fetch-depth: 1
38+
- uses: anthropics/claude-code-action@v1
39+
with:
40+
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
41+
claude_args: |
42+
--append-system-prompt "Review this PR against the rules in .ai/review-rules.md. Focus on correctness, not style (ruff handles style). Only review changes under src/diffusers/. Do NOT commit changes unless the comment explicitly asks you to using the phrase 'commit this'."

docs/source/en/_toctree.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,10 @@
448448
title: AutoencoderKLHunyuanVideo
449449
- local: api/models/autoencoder_kl_hunyuan_video15
450450
title: AutoencoderKLHunyuanVideo15
451+
- local: api/models/autoencoder_kl_kvae
452+
title: AutoencoderKLKVAE
453+
- local: api/models/autoencoder_kl_kvae_video
454+
title: AutoencoderKLKVAEVideo
451455
- local: api/models/autoencoderkl_audio_ltx_2
452456
title: AutoencoderKLLTX2Audio
453457
- local: api/models/autoencoderkl_ltx_2
@@ -668,6 +672,10 @@
668672
- local: api/pipelines/z_image
669673
title: Z-Image
670674
title: Image
675+
- sections:
676+
- local: api/pipelines/llada2
677+
title: LLaDA2
678+
title: Text
671679
- sections:
672680
- local: api/pipelines/allegro
673681
title: Allegro
@@ -716,6 +724,8 @@
716724
- sections:
717725
- local: api/schedulers/overview
718726
title: Overview
727+
- local: api/schedulers/block_refinement
728+
title: BlockRefinementScheduler
719729
- local: api/schedulers/cm_stochastic_iterative
720730
title: CMStochasticIterativeScheduler
721731
- local: api/schedulers/ddim_cogvideox
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2025 The Kandinsky Team and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. -->
14+
15+
# AutoencoderKLKVAE
16+
17+
The 2D variational autoencoder (VAE) model with KL loss.
18+
19+
The model can be loaded with the following code snippet.
20+
21+
```python
22+
import torch
23+
from diffusers import AutoencoderKLKVAE
24+
25+
vae = AutoencoderKLKVAE.from_pretrained("kandinskylab/KVAE-2D-1.0", subfolder="diffusers", torch_dtype=torch.bfloat16)
26+
```
27+
28+
## AutoencoderKLKVAE
29+
30+
[[autodoc]] AutoencoderKLKVAE
31+
- decode
32+
- all
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!-- Copyright 2025 The Kandinsky Team and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. -->
14+
15+
# AutoencoderKLKVAEVideo
16+
17+
The 3D variational autoencoder (VAE) model with KL loss.
18+
19+
The model can be loaded with the following code snippet.
20+
21+
```python
22+
import torch
23+
from diffusers import AutoencoderKLKVAEVideo
24+
25+
vae = AutoencoderKLKVAEVideo.from_pretrained("kandinskylab/KVAE-3D-1.0", subfolder="diffusers", torch_dtype=torch.float16)
26+
```
27+
28+
## AutoencoderKLKVAEVideo
29+
30+
[[autodoc]] AutoencoderKLKVAEVideo
31+
- decode
32+
- all
33+

docs/source/en/api/pipelines/cogvideox.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,16 +41,15 @@ The quantized CogVideoX 5B model below requires ~16GB of VRAM.
4141

4242
```py
4343
import torch
44-
from diffusers import CogVideoXPipeline, AutoModel
44+
from diffusers import CogVideoXPipeline, AutoModel, TorchAoConfig
4545
from diffusers.quantizers import PipelineQuantizationConfig
4646
from diffusers.hooks import apply_group_offloading
4747
from diffusers.utils import export_to_video
48+
from torchao.quantization import Int8WeightOnlyConfig
4849

4950
# quantize weights to int8 with torchao
5051
pipeline_quant_config = PipelineQuantizationConfig(
51-
quant_backend="torchao",
52-
quant_kwargs={"quant_type": "int8wo"},
53-
components_to_quantize="transformer"
52+
quant_mapping={"transformer": TorchAoConfig(Int8WeightOnlyConfig())}
5453
)
5554

5655
# fp8 layerwise weight-casting
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LLaDA2
14+
15+
[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) is a family of discrete diffusion language models
16+
that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation,
17+
LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement
18+
steps.
19+
20+
## Usage
21+
22+
```py
23+
import torch
24+
from transformers import AutoModelForCausalLM, AutoTokenizer
25+
26+
from diffusers import BlockRefinementScheduler, LLaDA2Pipeline
27+
28+
model_id = "inclusionAI/LLaDA2.1-mini"
29+
model = AutoModelForCausalLM.from_pretrained(
30+
model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto"
31+
)
32+
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
33+
scheduler = BlockRefinementScheduler()
34+
35+
pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer)
36+
output = pipe(
37+
prompt="Write a short poem about the ocean.",
38+
gen_length=256,
39+
block_length=32,
40+
num_inference_steps=32,
41+
threshold=0.7,
42+
editing_threshold=0.5,
43+
max_post_steps=16,
44+
temperature=0.0,
45+
)
46+
print(output.texts[0])
47+
```
48+
49+
## Callbacks
50+
51+
Callbacks run after each refinement step. Pass `callback_on_step_end_tensor_inputs` to select which tensors are
52+
included in `callback_kwargs`. In the current implementation, `block_x` (the sequence window being refined) and
53+
`transfer_index` (mask-filling commit mask) are provided; return `{"block_x": ...}` from the callback to replace the
54+
window.
55+
56+
```py
57+
def on_step_end(pipe, step, timestep, callback_kwargs):
58+
block_x = callback_kwargs["block_x"]
59+
# Inspect or modify `block_x` here.
60+
return {"block_x": block_x}
61+
62+
out = pipe(
63+
prompt="Write a short poem.",
64+
callback_on_step_end=on_step_end,
65+
callback_on_step_end_tensor_inputs=["block_x"],
66+
)
67+
```
68+
69+
## Recommended parameters
70+
71+
LLaDA2.1 models support two modes:
72+
73+
| Mode | `threshold` | `editing_threshold` | `max_post_steps` |
74+
|------|-------------|---------------------|------------------|
75+
| Quality | 0.7 | 0.5 | 16 |
76+
| Speed | 0.5 | `None` | 16 |
77+
78+
Pass `editing_threshold=None`, `0.0`, or a negative value to turn off post-mask editing.
79+
80+
For LLaDA2.0 models, disable editing by passing `editing_threshold=None` or `0.0`.
81+
82+
For all models: `block_length=32`, `temperature=0.0`, `num_inference_steps=32`.
83+
84+
## LLaDA2Pipeline
85+
[[autodoc]] LLaDA2Pipeline
86+
- all
87+
- __call__
88+
89+
## LLaDA2PipelineOutput
90+
[[autodoc]] pipelines.LLaDA2PipelineOutput

0 commit comments

Comments
 (0)