|
| 1 | +<!--Copyright 2025 The HuggingFace Team. All rights reserved. |
| 2 | +
|
| 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| 4 | +the License. You may obtain a copy of the License at |
| 5 | +
|
| 6 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | +
|
| 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| 10 | +specific language governing permissions and limitations under the License. |
| 11 | +--> |
| 12 | + |
| 13 | +# LLaDA2 |
| 14 | + |
| 15 | +[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) is a family of discrete diffusion language models |
| 16 | +that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, |
| 17 | +LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement |
| 18 | +steps. |
| 19 | + |
| 20 | +## Usage |
| 21 | + |
| 22 | +```py |
| 23 | +import torch |
| 24 | +from transformers import AutoModelForCausalLM, AutoTokenizer |
| 25 | + |
| 26 | +from diffusers import BlockRefinementScheduler, LLaDA2Pipeline |
| 27 | + |
| 28 | +model_id = "inclusionAI/LLaDA2.1-mini" |
| 29 | +model = AutoModelForCausalLM.from_pretrained( |
| 30 | + model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto" |
| 31 | +) |
| 32 | +tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| 33 | +scheduler = BlockRefinementScheduler() |
| 34 | + |
| 35 | +pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer) |
| 36 | +output = pipe( |
| 37 | + prompt="Write a short poem about the ocean.", |
| 38 | + gen_length=256, |
| 39 | + block_length=32, |
| 40 | + num_inference_steps=32, |
| 41 | + threshold=0.7, |
| 42 | + editing_threshold=0.5, |
| 43 | + max_post_steps=16, |
| 44 | + temperature=0.0, |
| 45 | +) |
| 46 | +print(output.texts[0]) |
| 47 | +``` |
| 48 | + |
| 49 | +## Callbacks |
| 50 | + |
| 51 | +Callbacks run after each refinement step. Pass `callback_on_step_end_tensor_inputs` to select which tensors are |
| 52 | +included in `callback_kwargs`. In the current implementation, `block_x` (the sequence window being refined) and |
| 53 | +`transfer_index` (mask-filling commit mask) are provided; return `{"block_x": ...}` from the callback to replace the |
| 54 | +window. |
| 55 | + |
| 56 | +```py |
| 57 | +def on_step_end(pipe, step, timestep, callback_kwargs): |
| 58 | + block_x = callback_kwargs["block_x"] |
| 59 | + # Inspect or modify `block_x` here. |
| 60 | + return {"block_x": block_x} |
| 61 | + |
| 62 | +out = pipe( |
| 63 | + prompt="Write a short poem.", |
| 64 | + callback_on_step_end=on_step_end, |
| 65 | + callback_on_step_end_tensor_inputs=["block_x"], |
| 66 | +) |
| 67 | +``` |
| 68 | + |
| 69 | +## Recommended parameters |
| 70 | + |
| 71 | +LLaDA2.1 models support two modes: |
| 72 | + |
| 73 | +| Mode | `threshold` | `editing_threshold` | `max_post_steps` | |
| 74 | +|------|-------------|---------------------|------------------| |
| 75 | +| Quality | 0.7 | 0.5 | 16 | |
| 76 | +| Speed | 0.5 | `None` | 16 | |
| 77 | + |
| 78 | +Pass `editing_threshold=None`, `0.0`, or a negative value to turn off post-mask editing. |
| 79 | + |
| 80 | +For LLaDA2.0 models, disable editing by passing `editing_threshold=None` or `0.0`. |
| 81 | + |
| 82 | +For all models: `block_length=32`, `temperature=0.0`, `num_inference_steps=32`. |
| 83 | + |
| 84 | +## LLaDA2Pipeline |
| 85 | +[[autodoc]] LLaDA2Pipeline |
| 86 | + - all |
| 87 | + - __call__ |
| 88 | + |
| 89 | +## LLaDA2PipelineOutput |
| 90 | +[[autodoc]] pipelines.LLaDA2PipelineOutput |
0 commit comments