Skip to content

NJU-PCALab/UltraHR-100k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

UltraHR-100K

Ultra-high-resolution (UHR) text-to-image (T2I) generation has seen notable progress. However, two key challenges remain : 1) the absence of a large-scale high-quality UHR T2I dataset, and (2) the neglect of tailored training strategies for fine-grained detail synthesis in UHR scenarios. To tackle the first challenge, we introduce \textbf{UltraHR-100K}, a high-quality dataset of 100K UHR images with rich captions, offering diverse content and strong visual fidelity. Each image exceeds 3K resolution and is rigorously curated based on detail richness, content complexity, and aesthetic quality. To tackle the second challenge, we propose a frequency-aware post-training method that enhances fine-detail generation in T2I diffusion models. Specifically, we design (i) \textit{Detail-Oriented Timestep Sampling (DOTS)} to focus learning on detail-critical denoising steps, and (ii) \textit{Soft-Weighting Frequency Regularization (SWFR)}, which leverages Discrete Fourier Transform (DFT) to softly constrain frequency components, encouraging high-frequency detail preservation. Extensive experiments on our proposed UltraHR-eval4K benchmarks demonstrate that our approach significantly improves the fine-grained detail quality and overall fidelity of UHR image generation.

News 🚀🚀🚀

  • [2025.11.21] We have uploaded the data filtering code, which is provided in Fliter.zip.
  • [2025.09.18] 🏆 UltraHR-100K is accepted by NeurIPS 2025!!!

Preparation

Environment

conda create -n ultrahr python=3.10
conda activate ultrahr
pip install torch torchvision
pip install packaging ninja
pip install flash-attn --no-build-isolation
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git
pip install -U xformers --index-url https://download.pytorch.org/whl/cu121

Dataset

Download UltraHR-100K dataset. Note: We have re-uploaded the complete ZIP dataset. Please merge the downloaded fragments before use. You can combine the slices with the following command:

cat part_* > ultrahr100k.zip

Benchmark

Download UltraHR-eval4K dataset.

License

This dataset is available under a Creative Commons BY-NC 4.0 license. For detailed information, please refer to the license_UltraHR100K.txt document.

You CAN: Use, modify, and share the data for non-commercial research purposes ONLY. Train models and publish research papers (Results), as long as they don't contain a significant portion of the original data.

You CANNOT: Use the data for harmful, unethical, or misleading purposes. Attempt to re-identify any individuals.

You MUST: Give appropriate credit (attribution) to the UltraHR-100K dataset and any upstream sources. Ensure anyone you share the data with also agrees to these same terms.

Disclaimer: The data is provided "AS IS." The creators are not liable for any misuse. By using this dataset, you agree to the full License Agreement.

Citation

@article{zhao2025ultrahr,
  title={UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset},
  author={Zhao, Chen and Ci, En and Xu, Yunzhe and Fan, Tiehan and Guan, Shanyan and Ge, Yanhao and Yang, Jian and Tai, Ying},
  journal={Advances in Neural Information Processing Systems},
  year={2025}
}

About

This is the official repository of UltraHR-100K.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors