Skip to content

lobizhao/CUDA-Path-Tracer

 
 

Repository files navigation

CUDA Path Tracer

HeadPage

Unreal Engine Animation show case - 'SKM_Manny.obj'

Part 1

Basic BSDF

Part 1 Reflection
Suzanne.obj

Implemented a unified shading kernel supporting multiple material types.

  • BSDF Evaluation Shading Kernel
__global__ void shadeMaterial_with_BSDF()
...
__host__ __device__ void scatterRay()

Material Sorting

  • Material-based Memory Contiguity
Purpose: Reduce GPU warp divergence by grouping rays with same material types.

Controlled by #define SORT_MATERIAL for easy performance comparison.

Implementation: thrust::stable_sort_by_key sorts paths by material ID before shading.

Stochastic Antialiasing

block size 128 chart 2
- Implemented sub-pixel sampling for edge smoothing
Control: Enabled via #define ANTI_ALIASING 1

Implementation: u01(rng) generates random offsets in [0,1) range for ray generation

analysis

Part 2

Refraction

Part 1 Reflection
cornell_suzanne.json
  • Implemented physically accurate refraction for transparent materials like glass
 Entry/Exit Detection: bool entering = cosTheta > 0
 Schlick Fresnel Approximation: Implemented schlickFresnel() function
 Total Internal Reflection: if (glm::length(refracted) < 0.001f)
 Material Configuration: JSON "TYPE": "Refractive" support

Camera

  • Depth of field
block size 128 chart 2
cornell_suzanne.json "LENS_RADIUS" and "FOCAL_DISTANCE"
  • Simulates real camera lens with configurable aperture size and focal distance. Objects at focal distance appear sharp while foreground and background objects blur naturally based on distance from focal plane.

  • Random sampling across circular lens aperture for each ray using uniform polar coordinate distribution. Generates realistic bokeh effects through Monte Carlo convergence over multiple iterations.

#if DEPTH_OF_FIELD
    // compute focal point
    glm::vec3 focalPoint = cam.position + cam.focalDistance * rayDir;
    
    float theta = u01(rng) * 2.0f * 3.14159265f;
    float r = cam.lensRadius * sqrt(u01(rng));
    glm::vec3 lensOffset = r * (cos(theta) * cam.right + sin(theta) * cam.up);
  
    segment.ray.origin = cam.position + lensOffset;
    segment.ray.direction = glm::normalize(focalPoint - segment.ray.origin);

Load Mesh & Env

Part 1 Reflection
suzanne.json & Blue_stripe.hdr
  • Mesh Loading Workflow (Custom OBJ Loader - src/objLoader.cpp, Scene Integration - src/scene.cpp)

    Custom OBJ Parser: Implemented lightweight OBJ loader supporting vertices, normals, and faces with v//vn format. Parses geometry data and applies world transformations including translation, rotation, and scaling. Each mesh is assigned a single material ID and integrated into the scene's triangle array.

    Scene Integration: Meshes are loaded via JSON configuration and transformed to world coordinates using transformation matrices. Normal vectors are properly transformed using inverse transpose matrices to maintain correct lighting calculations.

  • BVH Acceleration Structure (BVH Construction - src/bvh.cpp)

    Part 1 Reflection
    suzanne.obj - 16689 triangles

    Manny_Skm.obj - 73184 triangles

    Built using Surface Area Heuristic for optimal partitioning. Combines both primitive geometry (spheres, cubes) and triangle meshes into a unified acceleration structure. Uses 12-bucket SAH evaluation to minimize intersection cost.

    GPU Traversal: Implements stack-based iterative traversal optimized for GPU execution. Uses linear memory layout for cache efficiency and supports both geometry primitives and triangle meshes in the same BVH tree.

    Performance: Reduces intersection complexity from O(n) to O(log n), enabling efficient rendering of complex meshes with thousands of triangles. Build statistics show construction time and node count for performance analysis.

    BVHAccel::BVHAccel(std::vector<std::shared_ptr<Primitive>>& prims, int maxPrimsInNode)
    ...
    BVHBuildNode* BVHAccel::recursiveBuild(
    std::vector<Primitive>& primitiveInfo,
    int start, int end, int* totalNodes,
    std::vector<std::shared_ptr<Primitive>>& orderedPrims)
    ...
    __global__ void computeIntersectionsBVH(
    int depth, int num_paths,
    PathSegment* pathSegments,
    Geom* geoms, int geoms_size,
    Triangle* triangles, int triangles_size,
    LinearBVHNode* bvhNodes,
    ShadeableIntersection* intersections)
    
    
  • HDR Environment Map Loading (src/texture.cpp)

    HDR data is transferred to CUDA texture objects for hardware-accelerated sampling. Creates cudaTextureObject_t with linear filtering and wrap addressing modes. Supports both HDR (float4) and standard (uchar4) texture formats with automatic format detection.

    // Loading Process
    envMap.loadToCPU(fullenvpath);           // CPU loading
    envmapHandle = scene->envMap.loadToCuda(); // GPU transfer
    
    // JSON Configuration
    "EnvMap": {
      "PATH": "../scenes/Blue_stripe.hdr"
    }
    

Performance

  • Stream Compaction
block size 128 chart 2
Part 1 Reflection
suzanne.json & Blue_stripe.hdr

Path Termination Detection

  • Condition: remainingBounces > 0
  • Purpose: To identify paths that still need to be traced.

Memory Compaction

  • Uses thrust::stable_partition.
  • Moves the active paths to the beginning of the array.
#if COMPACTION
if (depth % 2 == 1 || depth == traceDepth - 1) {
    auto lastPath = dev_paths + num_paths;
    auto mid = thrust::stable_partition(thrust::device, 
    dev_paths, lastPath, IsAlive{});
    num_paths = mid - dev_paths; 
}
#endif

BETTER_RANDOM

  • Enhanced random number generator providing improved distribution quality and performance optimization
Control: #define BETTER_RANDOM 1 (currently enabled)

Original Method: utilhash() - Simple bit-manipulation hash function
int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index);

Improved Method: fastHash() - Optimized 32-bit hash algorithm
uint32_t seed = index + (iter << 16) + (depth << 8);
return thrust::default_random_engine(fastHash(seed));

Performance Gains: Reduces hash collisions, provides more uniform random distribution
Applications: Ray generation, material sampling, antialiasing, depth of field effects

Russian Roulette Ray Termination

  • Implemented probabilistic path termination optimization to reduce computational overhead for low-contribution rays
Termination Threshold : Applied when remainingBounces < 3

Survival Probability : 80% chance to continue (20% termination)

Importance Weighting : Surviving rays scaled by 1.25f to maintain unbiased estimation

Control : Toggleable via #define RUSSIAN_ROULETTE 0 (currently disabled)

Bloopers

Part 1 Reflection
BSDF Sampling Implementation Errors. Incorrect cosine-weighted sampling implementation in calculateRandomDirectionInHemisphere().
Part 1 Reflection
Glass material error

About

CUDA performance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 87.8%
  • C 10.8%
  • Cuda 1.1%
  • Other 0.3%