Skip to content

Conversation

@renovate
Copy link
Contributor

@renovate renovate bot commented Jan 16, 2026

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
cache_dit ==1.1.10==1.2.2 age confidence

Release Notes

vipshop/cache-dit (cache_dit)

v1.2.2

Compare Source

What's Changed

Full Changelog: vipshop/cache-dit@v1.2.1...v1.2.2

v1.2.1: USP, 2D/3D Parallel

Compare Source

🎉 v1.2.1 release is ready, the major updates including: Ring Attention w/ batched P2P, USP (Hybrid Ring and Ulysses), Hybrid 2D and 3D Parallelism (💥USP + TP), VAE-P Comm overhead reduce.

# Hybrid 2D/3D Parallelism in Cache-DiT is fully compatible w/ torch.compile, 
# Cache Acceleration, Text Encoder Parallelism, VAE Parallelism and more.
torchrun --nproc_per_node=8 -m cache_dit.generate flux2 --config parallel_2d.yaml --compile
torchrun --nproc_per_node=8 -m cache_dit.generate flux2 --config parallel_3d.yaml --compile
torchrun --nproc_per_node=8 -m cache_dit.generate --parallel ulysses_tp --cache --compile

What's Changed

New Contributors

Full Changelog: vipshop/cache-dit@v1.2.0...v1.2.1

v1.2.0: Major Release: NPU, TE-P, VAE-P, CN-P, ...

Compare Source

v1.2.0 Major Release: NPU, TE-P, VAE-P, CN-P, ...

Overviews

v1.2.0 is a Major Release after v1.1.0. We introduced many updates in v1.2.0, thereby further enhancing the ease of use and performance of Cache-DiT. We sincerely thank the contributors of Cache-DiT. The main updates for this time are as follows, includes:

  • 🎉New Models Support
  • 🎉Request level cache context
  • 🎉HTTP Serving Support
  • 🎉Context Parallelism Optimization
  • 🎉Text Encoder Parallelism
  • 🎉Auto Encoder (VAE) Parallelism
  • 🎉ControlNet Parallelism
  • 🎉Ascend NPU Support
  • 🎉Community Integration.
🔥New Models Support
  • Qwen-Image:
    • Image: Qwen-Image-2512, Qwen-Image-Layered
    • Edit: Qwen-Image-Edit-2511, Qwen-Image-Edit-2509
    • ControlNet: Qwen-Image-ControlNet, Qwen-Image-ControlNet-Inpainting
  • Qwen-Image-Lightning: Qwen-Image-Lightning series, Qwen-Image-Edit-Lightning series
  • Wan: Wan 2.1 VACE, Wan 2.2 VACE.
  • Z-Image: Z-Image-Turbo, Z-Image-Turbo-Fun-ControlNet-2.0, Z-Image-Turbo-Fun-ControlNet-2.1
  • FLUX.2: FLUX.2-dev, FLUX.2-Klein-4B, FLUX.2-Klein-base-4B, FLUX.2-Klein-9B, FLUX.2-Klein-base-9B
  • LTX-2: LTX-2-I2V, LTX-2-T2V by @​BBuf
  • Ovis-Image: Ovis-Image
  • LongCat-Image: LongCat-Image, LongCat-Image-Edit
  • Nunchaku INT4 Models: Z-Image-Turbo, Qwen-Image-Edit-2511
🔥Request level cache context

If you need to use a different num_inference_steps for each user request instead of a fixed value, you should use it in conjunction with refresh_context API. Before performing inference for each user request, update the cache context based on the actual number of steps. Please refer to 📚run_cache_refresh as an example.

import cache_dit
from cache_dit import DBCacheConfig
from diffusers import DiffusionPipeline

# Init cache context with num_inference_steps=None (default)
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
pipe = cache_dit.enable_cache(pipe.transformer, cache_config=DBCacheConfig(num_inference_steps=None))

# Assume num_inference_steps is 28, and we want to refresh the context
cache_dit.refresh_context(pipe.transformer, num_inference_steps=28, verbose=True)
output = pipe(...) # Just call the pipe as normal.
stats = cache_dit.summary(pipe.transformer) # Then, get the summary

# Update the cache context with new num_inference_steps=50.
cache_dit.refresh_context(pipe.transformer, num_inference_steps=50, verbose=True)
output = pipe(...) # Just call the pipe as normal.
stats = cache_dit.summary(pipe.transformer) # Then, get the summary

# Update the cache context with new cache_config.
cache_dit.refresh_context(
    pipe.transformer,
    cache_config=DBCacheConfig(
        residual_diff_threshold=0.1,
        max_warmup_steps=10,
        max_cached_steps=20,
        max_continuous_cached_steps=4,
        # The cache settings should all be located in the cache config 
        # if cache config is provided. Otherwise, we will skip it.
        num_inference_steps=50,
    ),
    verbose=True,
)
output = pipe(...) # Just call the pipe as normal.
stats = cache_dit.summary(pipe.transformer) # Then, get the summary
🔥HTTP Serving Support
  • Built-in HTTP serving deployment support with simple REST APIs by @​BBuf, deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
🔥Context Parallelism Optimization
🔥Text Encoder Parallelism

Currently, cache-dit supported text encoder parallelism for T5Encoder, UMT5Encoder, Llama, Gemma 1/2/3, Mistral, Mistral-3, Qwen-3, Qwen-2.5 VL, Glm and Glm-4 model series, namely, supported almost 🔥ALL pipelines in diffusers.

Users can set the extra_parallel_modules parameter in parallelism_config (when using Tensor Parallelism or Context Parallelism) to specify additional modules that need to be parallelized beyond the main transformer — e.g, text_encoder in Flux2Pipeline. It can further reduce the per-GPU memory requirement and slightly improve the inference performance of the text encoder.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

# Transformer Tensor Parallelism + Text Encoder Tensor Parallelism
cache_dit.enable_cache(
    pipe, 
    cache_config=DBCacheConfig(...),
    parallelism_config=ParallelismConfig(
        tp_size=2,
        parallel_kwargs={
            "extra_parallel_modules": [pipe.text_encoder], # FLUX.2
        },
    ),
)
🔥Auto Encoder (VAE) Parallelism

Currently, cache-dit supported auto encoder (vae) parallelism for AutoencoderKL, AutoencoderKLQwenImage, AutoencoderKLWan, and AutoencoderKLHunyuanVideo series, namely, supported almost 🔥ALL pipelines in diffusers. It can further reduce the per-GPU memory requirement and slightly improve the inference performance of the auto encoder. Users can set it by extra_parallel_modules parameter in parallelism_config, for example:

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

# Transformer Context Parallelism + Text Encoder Tensor Parallelism + VAE Data Parallelism
cache_dit.enable_cache(
    pipe, 
    cache_config=DBCacheConfig(...),
    parallelism_config=ParallelismConfig(
        ulysses_size=2,
        parallel_kwargs={
            "extra_parallel_modules": [pipe.text_encoder, pipe.vae], # FLUX.1
        },
    ),
)
🔥ControlNet Parallelism

Further, cache-dit even supported controlnet parallelism for specific models, such as Z-Image-Turbo with ControlNet. Users can set it by extra_parallel_modules parameter in parallelism_config, for example:

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

# Transformer Context Parallelism + Text Encoder Tensor Parallelism 

# + VAE Data Parallelism + ControlNet Context Parallelism
cache_dit.enable_cache(
    pipe, 
    cache_config=DBCacheConfig(...),
    parallelism_config=ParallelismConfig(
        ulysses_size=2,
        # case: Z-Image-Turbo-Fun-ControlNet-2.1
        parallel_kwargs={
            "extra_parallel_modules": [pipe.text_encoder, pipe.vae, pipe.controlnet],
        },
    ),
)

# torchrun --nproc_per_node=2 parallel_cache.py
🔥Ascend NPU Support

Cache-DiT now provides native support for Ascend NPU (by @​gameofdimension @​luren55 @​DefTruth). Theoretically, nearly all models supported by Cache-DiT can run on Ascend NPU with most of Cache-DiT’s optimization technologies, including:

  • Hybrid Cache Acceleration (DBCache, DBPrune, TaylorSeer, SCM and more)
  • Context Parallelism (w/ Extended Diffusers' CP APIs, UAA, Async Ulysses, ...)
  • Tensor Parallelism (w/ PyTorch native DTensor and Tensor Parallelism APIs)
  • Text Encoder Parallelism (w/ PyTorch native DTensor and Tensor Parallelism APIs)
  • Auto Encoder (VAE) Parallelism (w/ Data or Tile Parallelism, avoid OOM)
  • ControlNet Parallelism (w/ Context Parallelism for ControlNet module)
  • Built-in HTTP serving deployment support with simple REST APIs

Please refer to Ascend NPU Supported Matrix for more details.

🔥Community Integration

Full Changelogs


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/cache_dit-1.x branch from 9463dfe to fd0ef19 Compare February 2, 2026 05:23
@renovate renovate bot changed the title Update dependency cache_dit to v1.2.0 Update dependency cache_dit to v1.2.1 Feb 2, 2026
@renovate renovate bot force-pushed the renovate/cache_dit-1.x branch from fd0ef19 to c65665d Compare February 10, 2026 08:50
@renovate renovate bot changed the title Update dependency cache_dit to v1.2.1 Update dependency cache_dit to v1.2.2 Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants