Cumulus is a real-time, volumetric cloud rendering engine built from scratch in DirectX 12.
It extends the architectural principles of Guerrilla Games' Nubis 3 technology by introducing a fully procedural, GPU-driven generation pipeline. Unlike static implementations, Cumulus supports dynamic object interaction (collisions), time-of-day transitions, and variable density modeling without offline pre-computation.
SDF Field |
NVDF: Density |
NVDF: Detail Type |
NVDF: Scale |
Based on Guerrilla Games' Nubis 3, the renderer uses a dual-texture approach to decouple macro shapes from micro details while maximizing performance:
- NVDF (Noise-Voxel Density Field): A 3D texture defining local material properties:
- Density: Base shape and opacity.
- Detail Type: Noise pattern selector (e.g., billow vs. wispy).
- Scale: Feature size control (e.g., fluffy tops vs. flat bottoms).
- SDF (Signed Distance Field): A low-res distance map used for empty-space skipping. Rays take large steps through empty air and switch to fine integration only when the SDF indicates proximity to the cloud surface.
Direct Light |
Multi-Scattering |
Ambient |
Combined Beauty |
The lighting model integrates three components based on the Nubis 3 architecture:
- Direct Lighting: Uses Beer’s Law for transmittance and a dual-lobe Henyey-Greenstein phase function to create intense forward scattering ("silver lining").
- Multi-Scattering: Approximates internal light diffusion and the "powder effect" (dark edges) using a probability function rather than expensive path tracing.
- Ambient Lighting: Applies a height-based gradient that blends sky color at the top with ground albedo at the bottom to ground the volume in the scene.
To decouple the expensive lighting calculation from the view ray march, the engine implements Light Ray Caching. Lighting is pre-computed for each voxel in a separate compute pass before the main render. This prevents the nested loop nightmare of marching toward the sun at every view sample, allowing the primary ray to simply look up the incoming light energy cheaply.
Cloud placement is procedurally driven by an "SDF Path" system. An event system instantiates clouds along guided paths defined by Signed Distance Fields. Each cloud instance maintains unique parameters for density decay and detail type, allowing for art-directable variations within a procedurally generated sky.
The engine supports real-time volumetric destruction. Interaction is handled by checking convex hull collisions against the cloud's density voxels. Those checks are accelerated via a compute shader. Collision data is packed per mesh instance, rather than entity instance, to minimize memory overhead during the physics pass.
Clouds created in real-time in a compute shader
Cloud formation is fully procedural and controllable in real-time via ImGUI (e.g., cloud count, scale multiplier). The generation pipeline operates in two stages:
- CPU Seeding: "Seeds" are initialized as world-space coordinates to track cloud position, movement, and formation over time.
- GPU Shaping: For each seed, a compute shader generates a base SDF shape using Inigo Quilez’s primitive distance functions. The base form is a round cone surrounded by "Vesica Segments" (football-like shapes), where orientation, count, and size are driven by noise and the input scale factor.
Baked Noise: Dimensional Profile |
Baked Noise: Detail Type |
To ensure runtime performance, complex noise functions are pre-baked into static 3D textures rather than calculated per-frame:
- Billow Noise: Based on the psrdnoise implementation by Stefan Gustavson and Ian MacEwan, this modified Perlin noise uses rotated cells to create the characteristic "puffiness" of cumulus clouds.
- Fractal Sum & Easing: Noise values are eased out near SDF boundaries and accumulated using fractal sums to eliminate voxel-like artifacts.
- Detail & Density: High-frequency detail is driven by scaled billow noise that intensifies with altitude (mimicking wispy cloud tops), while density scale remains relatively uniform across the profile.
Sunrise, daytime, sunset, night time, with the ImGUI controls
The atmospheric rendering system implements Eric Bruneton's Precomputed Atmospheric Scattering model. To maximize performance, the engine bypasses runtime initialization by loading pre-baked Irradiance, Scattering, and Transmission textures.
The sky is rendered in a raycasting pre-pass that seamlessly blends Polar and Cartesian camera models to support a fully dynamic day/night cycle, complete with UI-controllable sun positioning and a custom moon and night sky implementation.
Engine overview diagram |
Full render pipeline |
- Volumetric Ray-Marching: Compute-driven pipeline for handling density integration, light caching, and SDF stepping.
- Atmospheric Scattering: Dedicated pre-pass for sky, sun, and moon rendering based on precomputed LUTs.
- Simulation & Physics: Compute shaders for procedural cloud generation and convex hull collision detection.
- Post-Processing: Full-screen pass system for tone mapping and final compositing.
- Asset Management: Automated loading of 3D models (Assimp) and texture construction (DirectXTex) for NVDF, SDF, and Noise volumes.
- Shader-Driven Reflection: Resource binding is automated via
ID3D12ShaderReflection, allowing for string-based parameter setting without manual root signature matching. - "Pass" Framework: A high-level abstraction that automatically generates Root Signatures and Pipeline State Objects (PSOs) based on shader requirements.
- D3D12 Abstractions: User-friendly wrappers for complex DirectX 12 objects including
Texture,UploadBuffer(staging), andFrameResourcemanagement. - Diagnostics: Integrated ImGUI for runtime controls, plus automatic lifetime reporting and strict error logging to catch memory leaks in Debug mode.
To test some of our performance, we captured a few different setups on an NVIDIA 4070 (Laptop).
We evaluated three different performance techniques we implemented. One, how prebaking the procedural noise textures affects performance with a varying number or scale of clouds. Two, how the distance to a cloud affects our performance due to ray marching. And third, how the convex hull algorithm compared to a naive triangle intersection check.
The scenes tested for the first scenario are:
Four Clouds at 1.0 Scale |
Four Big Clouds |
Eight Clouds at 8.0 Scale |
Sixteen Clouds at Maximum Scale |
The results comparison of the procedural compute pass in milliseconds:
| Scene | Offline Texture (ms) | Online Texture (ms) |
|---|---|---|
| Four clouds - 1.0 | 6.21 | 6.51 |
| Four Big Clouds | 7.32 | 9.79 |
| Eight Clouds | 13.15 | 14.76 |
| Sixteen Clouds | 24.55 | 24.61 |
The scenes tested for the second scenario are:
A cloud at a distance |
A cloud up close |
On the edge of a cloud |
Inside a cloud |
The result comparison of the lighting cache and the raymarch compute passes in milliseconds:
| Scene | Light Cache (ms) | Raymarch (ms) |
|---|---|---|
| Far Cloud | 1.62 | 2.1 |
| Close Cloud | 2.21 | 20.8 |
| On the Edge | 1.5 | 20.3 |
| Inside Cloud | 1.77 | 9.62 |
Lastly, we have the convex hull collision checks. We standardized the objects to always be the arm model, our highest polygon obj with ~900 triangles.
| OBJ Count | Hull (ms) | Naive Triangles (ms) |
|---|---|---|
| 0 | 2.20 | 2.11 |
| 5 | 3.31 | 30.8 |
| 30 | 9.50 | 105.5 |
| 60 | 20.32 | 180.1 |
The performance boosts gained through our largely enhancements are self evident. Collision checks are essentially non functioning for a real time context without an optimized collision structure. Likewise, the light cache provided huge gains in near cloud contexts. Interestingly, there isn't much of a difference between offline and online textures. This points to the true bottle neck: the procedural cloud sdf calculations.
This project uses the Premake 5 build system (bundled in ./external/) to automate project configuration.
To build the project:
- Run
generate_vs2022.baton Windows. - Open the generated
Cumulus.slnin Visual Studio 2022. - Build and run.
Note: The Premake script (premake5.lua) automatically detects and adds new source/header files in the source directories, so manual project updates are not required when adding files.
- OS: Windows 10/11
- IDE: Visual Studio 2022 (MSVC v143 toolset)
- Language: C++17
- GPU: DirectX 12 compatible hardware
- DirectX Tex: Reading image files for texture generation
- Assimp 3.0.0: Loading 3D Models
- Nubis 3, the presentation behind this whole project
- Special thanks to Di Lu for helping us debug our shaders!
- Stefan Gustavson and Ian MacEwan for making billowy noise, and Stefan and Ashima Arts for fast perlin noise as well
- Domenic Portera for the HLSL port of the billowy noise.
- Eric Bruneton's Precomputed Atmospheric Scattering
- Inigo Quilez's blog on SDFs
- Björn Ottosson's okLab Conversion
Broken Camera Matrix while working on Atmospheric Scattering


























