The Visual Computing Lab at The Hong Kong Polytechnic University, led by Prof. Lei Zhang, works on image/video restoration and quality assessment, multimodal perception and reasoning, image/video synthesis, 3D perception and generation, efficient architectures and training, as well as benchmarks and datasets.
Here we share our projects, code, models, benchmarks, datasets, and demos.
𝕏 X • 🤗 Hugging Face • 📕 Xiaohongshu
🖼️ Image / Video Restoration, Enhancement and Quality Assessment
Real-world image and video restoration, enhancement, super-resolution, and perceptual quality assessment.
- VOSR (CVPR 2026) — Paper | Code
- GDPO-SR (CVPR 2026) — Paper | Code
- Flickerformer (CVPR 2026) -Paper | Code
- RASS (IJCV 2026) — Paper | Code
- NSARM (preprint) — Paper | Code
- CCSR: (TIP 2025) - Paper | Code
- InstructRestore (NeurIPS 2025) — Paper | Code | HF Paper
- VisualQuality-R1 (NeurIPS 2025) — Paper | Code
- DP²O-SR (NeurIPS 2025) — Paper | Code
- DLoRAL (NeurIPS 2025) — Paper | Code
- PURE (ICCV 2025) — Paper | Code | HF Paper
- TVT (ICCV 2025) — Paper | Code
- GSASR (ICCV 2025) — Paper | Code | HF Paper
- A-FINE (CVPR 2025) — Paper | Code
- PiSA-SR (CVPR 2025) — Paper | Code
- OSEDiff (NeurIPS 2024) — Paper | Code
- EA-Adam (TIP 2024) — Paper | Code | HF Paper
- SSL (ACM MM 2024) — Paper | Code | HF Paper
- PASD (ECCV 2024) — Paper | Code
- MGLD (ECCV 2024) — Paper | Code
- SeeSR (CVPR 2024) — Paper | Code
- TMP (TIP 2024) — Paper | Code | HF Paper
- Joint-HDRDN (CVPR 2023) — Paper | Code
- HGGT (CVPR 2023) — Paper | Code
- TPGSR (TIP 2023) — Paper | Code
- ELAN (ECCV 2022) — Paper | Code
- DASR (ECCV 2022) — Paper | Code
- UDKE (ECCV 2022) — Paper | Code
- LDL (CVPR 2022) — Paper | Code
- TATT (CVPR 2022) — Paper | Code
- Image-Adaptive-3DLUT (TPAMI 2022) — Paper | Code
- ECBSR (ACM MM 2021) — Paper | Code
- DCDicL (CVPR 2021) — Paper | Code
- LPTN (CVPR 2021) — Paper | Code
🤖 Multimodal Perception, Understanding and Reasoning
MLLM-based visual perception, grounding, OOD detection, dense understanding, and multimodal reasoning.
🎨 Image and Video Synthesis and Generation
Efficient, controllable, and high-quality generative models for image synthesis, editing, and video generation.
- Memorize When Needed (arXiv 2026) — Paper | Code | HF Model | HF Paper
- DP-DMD (ICML 2026) — Paper | Code
- CoCoEdit (ICML 2026) — Paper | Code
- Hybrid Forcing (preprint) — Paper / Code coming soon
- Many-for-Many (ICLR 2026) - Paper | Code
🌍 3D Perception, Reconstruction and Generation
3D reconstruction, scene generation, and versatile 3D editing from images, videos, and language prompts.
- Omni-3DEdit (CVPR 2026) — Paper | Code
- Photo3D (CVPR 2026) — Paper | Code
- One2Scene (ICLR 2026) — Paper | Code
- ViP3DE (AAAI 2026) — Paper | Code
- BEVDilation (AAAI 2026) — Paper | Code
- AlignCVC (AAAI 2026) — Paper | Code
⚡ Architecture and Training Paradigms
New model architectures and efficient training paradigms for vision models, diffusion transformers, LLMs, and VLMs.
- SPES (preprint) — Paper | Code
- Self-transcendence (preprint) - Paper | Code
- BinaryAttention (CVPR 2026) — Paper | Code
📊 Benchmarks and Datasets
Benchmarks and datasets for rigorous evaluation and reproducible progress in visual computing.
For a broader list of our papers, models, and datasets, please visit our Hugging Face Collections.
If you are interested in our work, welcome to follow the organization and star our repositories ⭐