Skip to content

rubenaryo/Cumulus

Repository files navigation

Cumulus

C++17 DirectX 12 Windows License

A DirectX 12 implementation of real-time interactive volumetric cloud.


Real-time Flythrough

Overview

Cumulus is a real-time, volumetric cloud rendering engine built from scratch in DirectX 12.

It extends the architectural principles of Guerrilla Games' Nubis 3 technology by introducing a fully procedural, GPU-driven generation pipeline. Unlike static implementations, Cumulus supports dynamic object interaction (collisions), time-of-day transitions, and variable density modeling without offline pre-computation.

Table of Contents

Features

Volumetric Cloud Rendering

image
A Good Morning Type of Cloud

Voxel-Based Ray Marching & Data Structures


SDF Field

NVDF: Density

NVDF: Detail Type

NVDF: Scale

Based on Guerrilla Games' Nubis 3, the renderer uses a dual-texture approach to decouple macro shapes from micro details while maximizing performance:

  • NVDF (Noise-Voxel Density Field): A 3D texture defining local material properties:
    • Density: Base shape and opacity.
    • Detail Type: Noise pattern selector (e.g., billow vs. wispy).
    • Scale: Feature size control (e.g., fluffy tops vs. flat bottoms).
  • SDF (Signed Distance Field): A low-res distance map used for empty-space skipping. Rays take large steps through empty air and switch to fine integration only when the SDF indicates proximity to the cloud surface.

Lighting Components


Direct Light

Multi-Scattering

Ambient

Combined Beauty

The lighting model integrates three components based on the Nubis 3 architecture:

  1. Direct Lighting: Uses Beer’s Law for transmittance and a dual-lobe Henyey-Greenstein phase function to create intense forward scattering ("silver lining").
  2. Multi-Scattering: Approximates internal light diffusion and the "powder effect" (dark edges) using a probability function rather than expensive path tracing.
  3. Ambient Lighting: Applies a height-based gradient that blends sky color at the top with ground albedo at the bottom to ground the volume in the scene.

Light Caching Optimization


Visualizing the cached light volume

To decouple the expensive lighting calculation from the view ray march, the engine implements Light Ray Caching. Lighting is pre-computed for each voxel in a separate compute pass before the main render. This prevents the nested loop nightmare of marching toward the sun at every view sample, allowing the primary ray to simply look up the incoming light energy cheaply.

Real-Time Volumetric Interactions

image
Real-time object collision

Script-Directed Cloud Instantiation

image
Jet-trails!

Cloud placement is procedurally driven by an "SDF Path" system. An event system instantiates clouds along guided paths defined by Signed Distance Fields. Each cloud instance maintains unique parameters for density decay and detail type, allowing for art-directable variations within a procedurally generated sky.

Novel Cloud Destruction

image
Convex-hull Visualization

The engine supports real-time volumetric destruction. Interaction is handled by checking convex hull collisions against the cloud's density voxels. Those checks are accelerated via a compute shader. Collision data is packed per mesh instance, rather than entity instance, to minimize memory overhead during the physics pass.

Procedural Cloud Generation

image
Clouds created in real-time in a compute shader

SDF-Based Random Cloud Generation


Visualization of the base SDF shapes

Cloud formation is fully procedural and controllable in real-time via ImGUI (e.g., cloud count, scale multiplier). The generation pipeline operates in two stages:

  1. CPU Seeding: "Seeds" are initialized as world-space coordinates to track cloud position, movement, and formation over time.
  2. GPU Shaping: For each seed, a compute shader generates a base SDF shape using Inigo Quilez’s primitive distance functions. The base form is a round cone surrounded by "Vesica Segments" (football-like shapes), where orientation, count, and size are driven by noise and the input scale factor.

Noise Baking Optimization


Baked Noise: Dimensional Profile

Baked Noise: Detail Type

To ensure runtime performance, complex noise functions are pre-baked into static 3D textures rather than calculated per-frame:

  • Billow Noise: Based on the psrdnoise implementation by Stefan Gustavson and Ian MacEwan, this modified Perlin noise uses rotated cells to create the characteristic "puffiness" of cumulus clouds.
  • Fractal Sum & Easing: Noise values are eased out near SDF boundaries and accumulated using fractal sums to eliminate voxel-like artifacts.
  • Detail & Density: High-frequency detail is driven by scaled billow noise that intensifies with altitude (mimicking wispy cloud tops), while density scale remains relatively uniform across the profile.

Physically-Based Atmosphere

image
Sunrise, daytime, sunset, night time, with the ImGUI controls

The atmospheric rendering system implements Eric Bruneton's Precomputed Atmospheric Scattering model. To maximize performance, the engine bypasses runtime initialization by loading pre-baked Irradiance, Scattering, and Transmission textures.

The sky is rendered in a raycasting pre-pass that seamlessly blends Polar and Cartesian camera models to support a fully dynamic day/night cycle, complete with UI-controllable sun positioning and a custom moon and night sky implementation.

Muon: A DirectX 12 Engine


Engine overview diagram

Full render pipeline

Rendering Pipeline

  • Volumetric Ray-Marching: Compute-driven pipeline for handling density integration, light caching, and SDF stepping.
  • Atmospheric Scattering: Dedicated pre-pass for sky, sun, and moon rendering based on precomputed LUTs.
  • Simulation & Physics: Compute shaders for procedural cloud generation and convex hull collision detection.
  • Post-Processing: Full-screen pass system for tone mapping and final compositing.
  • Asset Management: Automated loading of 3D models (Assimp) and texture construction (DirectXTex) for NVDF, SDF, and Noise volumes.

Architecture & Tooling

  • Shader-Driven Reflection: Resource binding is automated via ID3D12ShaderReflection, allowing for string-based parameter setting without manual root signature matching.
  • "Pass" Framework: A high-level abstraction that automatically generates Root Signatures and Pipeline State Objects (PSOs) based on shader requirements.
  • D3D12 Abstractions: User-friendly wrappers for complex DirectX 12 objects including Texture, UploadBuffer (staging), and FrameResource management.
  • Diagnostics: Integrated ImGUI for runtime controls, plus automatic lifetime reporting and strict error logging to catch memory leaks in Debug mode.

Performance Analysis

To test some of our performance, we captured a few different setups on an NVIDIA 4070 (Laptop).

We evaluated three different performance techniques we implemented. One, how prebaking the procedural noise textures affects performance with a varying number or scale of clouds. Two, how the distance to a cloud affects our performance due to ray marching. And third, how the convex hull algorithm compared to a naive triangle intersection check.

The scenes tested for the first scenario are:


Four Clouds at 1.0 Scale

Four Big Clouds

Eight Clouds at 8.0 Scale

Sixteen Clouds at Maximum Scale

The results comparison of the procedural compute pass in milliseconds:

Scene Offline Texture (ms) Online Texture (ms)
Four clouds - 1.0 6.21 6.51
Four Big Clouds 7.32 9.79
Eight Clouds 13.15 14.76
Sixteen Clouds 24.55 24.61

The scenes tested for the second scenario are:


A cloud at a distance

A cloud up close

On the edge of a cloud

Inside a cloud

The result comparison of the lighting cache and the raymarch compute passes in milliseconds:

Scene Light Cache (ms) Raymarch (ms)
Far Cloud 1.62 2.1
Close Cloud 2.21 20.8
On the Edge 1.5 20.3
Inside Cloud 1.77 9.62

Lastly, we have the convex hull collision checks. We standardized the objects to always be the arm model, our highest polygon obj with ~900 triangles.

OBJ Count Hull (ms) Naive Triangles (ms)
0 2.20 2.11
5 3.31 30.8
30 9.50 105.5
60 20.32 180.1

The performance boosts gained through our largely enhancements are self evident. Collision checks are essentially non functioning for a real time context without an optimized collision structure. Likewise, the light cache provided huge gains in near cloud contexts. Interestingly, there isn't much of a difference between offline and online textures. This points to the true bottle neck: the procedural cloud sdf calculations.

Setup & Development

Building

This project uses the Premake 5 build system (bundled in ./external/) to automate project configuration.

To build the project:

  1. Run generate_vs2022.bat on Windows.
  2. Open the generated Cumulus.sln in Visual Studio 2022.
  3. Build and run.

Note: The Premake script (premake5.lua) automatically detects and adds new source/header files in the source directories, so manual project updates are not required when adding files.

Requirements

  • OS: Windows 10/11
  • IDE: Visual Studio 2022 (MSVC v143 toolset)
  • Language: C++17
  • GPU: DirectX 12 compatible hardware

Dependencies

Appendices

External Credits

Related Presentations

Bloopers

image
Broken Atmospheric Scattering

image
Broken Camera Matrix while working on Atmospheric Scattering

About

A DirectX 12 project for real-time enabled cloud simulation and interactivity.

Resources

Stars

Watchers

Forks

Contributors

Languages