diff --git a/README.md b/README.md index 110697c..e872e85 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,43 @@ -CUDA Path Tracer -================ +# University of Pennsylvania, CIS 565: GPU Programming and Architecture. +Project 3 CUDA: Path Tracer +==================== -**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3** +## User resources +- **Name:** David Grosman. +- **Tested on:** Microsoft Windows 7 Professional, i7-5600U @ 2.6GHz, 256GB, GeForce 840M (Personal laptop). -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +## Project description +This Project's purpose was to gain some experience with writing Graphics code that would benefit as much as possible from CUDA as possible. In fact, a path tracer is a very useful type of application to run on Cuda since each ray processed by the application can be done in a separate thread. Furthermore, computations such as intersection-testing and pixel coloring are very intensive and are thus most usefully done on GPU. Furthermore, there is no memory bandwidth from CPU to GPU to pass on the pixel buffer as is usually the case in a CPU implementation. +In this project, I have implemented several key features such as: -### (TODO: Your README) +1. Diffuse shading +2. Specular reflection and refraction. +3. Caching Initial rays. +4. Stream compaction on non-terminated paths during each iteration. +5. Sort Rays by material type they are intersecting with. +6. Depth of Field. +7. Better hemisphere sampling methods. -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. + +###Shading: Diffuse and Specular reflection, refraction. +The shading implemented for this project is pretty straight-forward and is inspired from http://graphics.stanford.edu/courses/cs148-10-summer/docs/2006--degreve--reflection_refraction.pdf. The reflectance factor used to determine how much of the reflecting vs. refracting ray to use for the final color was implemented using the Schlik's approximation, which is pretty Schlik. The ray used to determine the diffuse color is actually a cosine weighted hemisphere where there are fewer rays along the horizon and increase density as we go up. +![](img/RefrAndRefl.JPG) + +###Caching Initial rays. +Since all initial rays go from the camera's position to each pixel of the screen, it is easy to cache the first intersection of each ray since they stay constant until the camera (or scene objects) move again. Unfortunately, there isn't much performance gain from doing this since we must not only use a lot of memory to store the intersections but we also only use them for the first ray bounce which is often a small portion of the time the path-tracer determines new intersections since each ray must bounce at least 8 times to get nice global illumination effects. + +###Stream compaction on non-terminated paths. +As shown on the graph below, the performance increase in compacting non-terminated paths is non-negligent, especially when increasing the number of bounces per ray. This is the most important optimization done for this project. +![](img/PerfGivenRayBounces.JPG) + +###Sort Rays by material type they are intersecting with. +This is an optimization which actually decreased the performance of the application: The time spent sorting the rays is non-negligeable, especially when there are so many rays and not space or time-coherent scheme is adopted to sort the rays faster. This optimization should be tested on Waverfont pathtracing where rays are grouped by material without a sorting pass. + +###Depth of Field. +Depth of field is an easy technique to implement when ray-tracing since the only change is how rays are generated. In fact their origin should be located on the lens and their direction should be towards a point that is at a dustance specified by the focal distance. +![](img/DOF.JPG) + +###Better hemisphere sampling methods. +I implemented Jittered, Halton and Sobel Pseudo-Random generators to improve the cosine weighted hemisphere generated when computing the diffuse component. Quasi-Random numbers as generated by Halton and Sobel schemes is pretty powerful since they can issue sequences of numbers specifically designed to fill an interval (typically [0…1] in n-dimensional space) in a relatively uniform way (compared to purely random or pseudo-random). In fact, each new number (or vector) in the sequence is spaced as far as possible from the previous ones. Their goal is to be able to generate randomized, but relatively uniformly spaced patterns, similar to jittering, but not requiring knowledge ahead of time about the total number of samples. +![](img/RefrOnRefl.JPG) diff --git a/img/DOF.JPG b/img/DOF.JPG new file mode 100644 index 0000000..02f3b90 Binary files /dev/null and b/img/DOF.JPG differ diff --git a/img/PerfGivenRayBounces.JPG b/img/PerfGivenRayBounces.JPG new file mode 100644 index 0000000..b590fab Binary files /dev/null and b/img/PerfGivenRayBounces.JPG differ diff --git a/img/RefrAndRefl.JPG b/img/RefrAndRefl.JPG new file mode 100644 index 0000000..9b7b220 Binary files /dev/null and b/img/RefrAndRefl.JPG differ diff --git a/img/RefrOnRefl.JPG b/img/RefrOnRefl.JPG new file mode 100644 index 0000000..0e79e67 Binary files /dev/null and b/img/RefrOnRefl.JPG differ diff --git a/scenes/cornell.txt b/scenes/cornell.txt index 83ff820..41c08a7 100644 --- a/scenes/cornell.txt +++ b/scenes/cornell.txt @@ -43,9 +43,9 @@ MATERIAL 4 RGB .98 .98 .98 SPECEX 0 SPECRGB .98 .98 .98 -REFL 1 -REFR 0 -REFRIOR 0 +REFL 0 +REFR 1 +REFRIOR 1.33 EMITTANCE 0 // Camera @@ -58,6 +58,8 @@ FILE cornell EYE 0.0 5 10.5 LOOKAT 0 5 0 UP 0 1 0 +LENS_RADIUS 0.0 +FOCAL_LENGTH 0.0 // Ceiling light @@ -114,4 +116,4 @@ sphere material 4 TRANS -1 4 -1 ROTAT 0 0 0 -SCALE 3 3 3 +SCALE 3 3 3 \ No newline at end of file diff --git a/scenes/cornellDOF.txt b/scenes/cornellDOF.txt new file mode 100644 index 0000000..28c30c3 --- /dev/null +++ b/scenes/cornellDOF.txt @@ -0,0 +1,139 @@ +// Emissive material (light) +MATERIAL 0 +RGB 1 1 1 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 5 + +// Diffuse white +MATERIAL 1 +RGB .98 .98 .98 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Diffuse red +MATERIAL 2 +RGB .85 .35 .35 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Diffuse green +MATERIAL 3 +RGB .35 .85 .35 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Specular white +MATERIAL 4 +RGB .98 .98 .98 +SPECEX 0 +SPECRGB .98 .98 .98 +REFL 0 +REFR 1 +REFRIOR 1.15 +EMITTANCE 0 + +// Specular white +MATERIAL 5 +RGB .98 .98 .98 +SPECEX 0 +SPECRGB .98 .98 .98 +REFL 1 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + + +// Camera +CAMERA +RES 800 800 +FOVY 45 +ITERATIONS 5000 +DEPTH 16 +FILE cornell +EYE 0.0 5 10.5 +LOOKAT 0 5 0 +UP 0 1 0 +LENS_RADIUS 0.5 +FOCAL_LENGTH 5.5 + + +// Ceiling light +OBJECT 0 +cube +material 0 +TRANS 0 10 0 +ROTAT 0 0 0 +SCALE 3 .3 3 + +// Floor +OBJECT 1 +cube +material 1 +TRANS 0 0 0 +ROTAT 0 0 0 +SCALE 10 .01 10 + +// Ceiling +OBJECT 2 +cube +material 1 +TRANS 0 10 0 +ROTAT 0 0 90 +SCALE .01 10 10 + +// Back wall +OBJECT 3 +cube +material 1 +TRANS 0 5 -5 +ROTAT 0 90 0 +SCALE .01 10 10 + +// Left wall +OBJECT 4 +cube +material 2 +TRANS -5 5 0 +ROTAT 0 0 0 +SCALE .01 10 10 + +// Right wall +OBJECT 5 +cube +material 3 +TRANS 5 5 0 +ROTAT 0 0 0 +SCALE .01 10 10 + +// Sphere +OBJECT 6 +sphere +material 5 +TRANS -1 4 -1 +ROTAT 0 0 0 +SCALE 3 3 3 + + +// Sphere +OBJECT 7 +sphere +material 4 +TRANS -1 4 4 +ROTAT 0 0 0 +SCALE 3 3 3 \ No newline at end of file diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index a1cb3fb..cef0957 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -15,6 +15,8 @@ set(SOURCE_FILES "preview.cpp" "utilities.cpp" "utilities.h" + "timer.cu" + "timer.h" ) cuda_add_library(src diff --git a/src/interactions.h b/src/interactions.h index 5ce3628..3beb7d5 100644 --- a/src/interactions.h +++ b/src/interactions.h @@ -1,79 +1,170 @@ #pragma once +#include "timer.h" #include "intersections.h" // CHECKITOUT /** - * Computes a cosine-weighted random direction in a hemisphere. - * Used for diffuse lighting. - */ +* Computes a cosine-weighted random direction in a hemisphere. +* Used for diffuse lighting. +*/ __host__ __device__ -glm::vec3 calculateRandomDirectionInHemisphere( - glm::vec3 normal, thrust::default_random_engine &rng) { - thrust::uniform_real_distribution u01(0, 1); - - float up = sqrt(u01(rng)); // cos(theta) - float over = sqrt(1 - up * up); // sin(theta) - float around = u01(rng) * TWO_PI; - - // Find a direction that is not the normal based off of whether or not the - // normal's components are all equal to sqrt(1/3) or whether or not at - // least one component is less than sqrt(1/3). Learned this trick from - // Peter Kutz. - - glm::vec3 directionNotNormal; - if (abs(normal.x) < SQRT_OF_ONE_THIRD) { - directionNotNormal = glm::vec3(1, 0, 0); - } else if (abs(normal.y) < SQRT_OF_ONE_THIRD) { - directionNotNormal = glm::vec3(0, 1, 0); - } else { - directionNotNormal = glm::vec3(0, 0, 1); - } - - // Use not-normal direction to generate two perpendicular directions - glm::vec3 perpendicularDirection1 = - glm::normalize(glm::cross(normal, directionNotNormal)); - glm::vec3 perpendicularDirection2 = - glm::normalize(glm::cross(normal, perpendicularDirection1)); - - return up * normal - + cos(around) * over * perpendicularDirection1 - + sin(around) * over * perpendicularDirection2; +glm::vec3 computeDiffuseDirection( +glm::vec3& normal, PRNG& rng) { + + float up = sqrt( rng.getNextVal01() ); // cos(theta) + float over = sqrt(1 - up * up); // sin(theta) + float around = rng.getNextVal01() * TWO_PI; + + // Find a direction that is not the normal based off of whether or not the + // normal's components are all equal to sqrt(1/3) or whether or not at + // least one component is less than sqrt(1/3). Learned this trick from + // Peter Kutz. + + glm::vec3 directionNotNormal; + if (abs(normal.x) < SQRT_OF_ONE_THIRD) { + directionNotNormal = glm::vec3(1, 0, 0); + } + else if (abs(normal.y) < SQRT_OF_ONE_THIRD) { + directionNotNormal = glm::vec3(0, 1, 0); + } + else { + directionNotNormal = glm::vec3(0, 0, 1); + } + + // Use not-normal direction to generate two perpendicular directions + glm::vec3 perpendicularDirection1 = + glm::normalize(glm::cross(normal, directionNotNormal)); + glm::vec3 perpendicularDirection2 = + glm::normalize(glm::cross(normal, perpendicularDirection1)); + + return up * normal + + cos(around) * over * perpendicularDirection1 + + sin(around) * over * perpendicularDirection2; +} + +__host__ __device__ +glm::vec3 computeDiffuseDirectionToo( +glm::vec3& normal, PRNG& rng) +{ + const float s = rng.getNextVal01(); + const float t = rng.getNextVal01(); + float u = TWO_PI * s; + float v = sqrt(1 - t); // sin(theta) + + return glm::vec3(v * cos(u), sqrt(t), v * sin(u)); +} + +__host__ __device__ +glm::vec3 computeReflectiveDirection( +const glm::vec3& normal, const glm::vec3& incident) +{ + const float cosI = glm::dot(normal, incident); + return incident - 2.0f * cosI * normal; +} + +__host__ __device__ +glm::vec3 computeRefractiveDirection( +const glm::vec3& normal, const glm::vec3& incident, +const float ni, const float nt) +{ + // Uses Snell's Law: + const float n = ni / nt; + const float cosI = -glm::dot(normal, incident); + float sinT2 = n * n * (1.0f - cosI * cosI); + if (sinT2 > 1.0) { return computeReflectiveDirection(normal, incident); } //Hack + const float cosT = sqrt(1.0 - sinT2); + return n * incident + (n * cosI - cosT) * normal; +} + +__host__ __device__ +float computeReflectance( +const glm::vec3& normal, const glm::vec3& incident, +const float ni, const float nt) +{ + // According to Schlick's model, the specular reflection + // coefficient R can be approximated by + const float cosI = -glm::dot(normal, incident); + const float sqrtR0 = (ni - nt) / (ni + nt); + const float R0 = sqrtR0 * sqrtR0; + return R0 + (1 - R0) * powf(1 - cosI, 5); } /** - * Scatter a ray with some probabilities according to the material properties. - * For example, a diffuse surface scatters in a cosine-weighted hemisphere. - * A perfect specular surface scatters in the reflected ray direction. - * In order to apply multiple effects to one surface, probabilistically choose - * between them. - * - * The visual effect you want is to straight-up add the diffuse and specular - * components. You can do this in a few ways. This logic also applies to - * combining other types of materias (such as refractive). - * - * - Always take an even (50/50) split between a each effect (a diffuse bounce - * and a specular bounce), but divide the resulting color of either branch - * by its probability (0.5), to counteract the chance (0.5) of the branch - * being taken. - * - This way is inefficient, but serves as a good starting point - it - * converges slowly, especially for pure-diffuse or pure-specular. - * - Pick the split based on the intensity of each material color, and divide - * branch result by that branch's probability (whatever probability you use). - * - * This method applies its changes to the Ray parameter `ray` in place. - * It also modifies the color `color` of the ray in place. - * - * You may need to change the parameter list for your purposes! - */ +* Scatter a ray with some probabilities according to the material properties. +* For example, a diffuse surface scatters in a cosine-weighted hemisphere. +* A perfect specular surface scatters in the reflected ray direction. +* In order to apply multiple effects to one surface, probabilistically choose +* between them. +* +* The visual effect you want is to straight-up add the diffuse and specular +* components. You can do this in a few ways. This logic also applies to +* combining other types of materias (such as refractive). +* +* - Always take an even (50/50) split between a each effect (a diffuse bounce +* and a specular bounce), but divide the resulting color of either branch +* by its probability (0.5), to counteract the chance (0.5) of the branch +* being taken. +* - This way is inefficient, but serves as a good starting point - it +* converges slowly, especially for pure-diffuse or pure-specular. +* - Pick the split based on the intensity of each material color, and divide +* branch result by that branch's probability (whatever probability you use). +* +* This method applies its changes to the Ray parameter `ray` in place. +* It also modifies the color `color` of the ray in place. +* +* You may need to change the parameter list for your purposes! +*/ __host__ __device__ void scatterRay( - PathSegment & pathSegment, - glm::vec3 intersect, - glm::vec3 normal, - const Material &m, - thrust::default_random_engine &rng) { - // TODO: implement this. - // A basic implementation of pure-diffuse shading will just call the - // calculateRandomDirectionInHemisphere defined above. -} +PathSegment & pathSegment, +glm::vec3 intersect, +glm::vec3 normal, +const Material &m, +PRNG& rng) +{ + const float MY_EPSILON = 1e-3f; + + Ray& outRay = pathSegment.ray; + glm::vec3& outColor = pathSegment.color; + + const bool isInwardsRay = (glm::dot(normal, outRay.direction) < 0.0f); + const float pathDir = isInwardsRay ? 1 : -1; + + outRay.origin = intersect + pathDir * normal * MY_EPSILON; + if (!m.hasReflective && !m.hasRefractive) // Diffuse + { + outRay.direction = computeDiffuseDirection(normal, rng); + outColor *= glm::abs(glm::dot(outRay.direction, normal)) * m.color; + } + else + { + float ni = 1.0f, nt = 1.0f; + { + if (isInwardsRay) + { + ni = 1.0f; + nt = m.indexOfRefraction; + } + else + { + ni = m.indexOfRefraction; + nt = 1.0f; + } + } + + float reflectance = computeReflectance(normal, outRay.direction, ni, nt); + if (m.hasReflective) + { + outRay.direction = glm::reflect(pathSegment.ray.direction, normal); + outColor *= reflectance* m.color; + + } + else if (m.hasRefractive) + { + outRay.direction = computeRefractiveDirection(normal, pathSegment.ray.direction, ni, nt); + outColor *= (1.0f - reflectance) * m.color; + outRay.origin = intersect - pathDir * normal * MY_EPSILON; + } + } +} \ No newline at end of file diff --git a/src/intersections.h b/src/intersections.h index 6f23872..7c2ccc1 100644 --- a/src/intersections.h +++ b/src/intersections.h @@ -6,19 +6,6 @@ #include "sceneStructs.h" #include "utilities.h" -/** - * Handy-dandy hash function that provides seeds for random number generation. - */ -__host__ __device__ inline unsigned int utilhash(unsigned int a) { - a = (a + 0x7ed55d16) + (a << 12); - a = (a ^ 0xc761c23c) ^ (a >> 19); - a = (a + 0x165667b1) + (a << 5); - a = (a + 0xd3a2646c) ^ (a << 9); - a = (a + 0xfd7046c5) + (a << 3); - a = (a ^ 0xb55a4f09) ^ (a >> 16); - return a; -} - // CHECKITOUT /** * Compute a point at parameter value `t` on ray `r`. diff --git a/src/main.cpp b/src/main.cpp index fe8e85e..2d29399 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -1,7 +1,10 @@ #include "main.h" #include "preview.h" +#include "timer.h" #include +#include "utilities.h" + static std::string startTimeString; // For camera controls @@ -20,17 +23,25 @@ glm::vec3 cameraPosition; glm::vec3 ogLookAt; // for recentering the camera Scene *scene; +SceneOptions sceneOptions; RenderState *renderState; int iteration; int width; int height; + + + +//------------------------------- +//------------------------------- + //------------------------------- //-------------MAIN-------------- //------------------------------- -int main(int argc, char** argv) { +int main(int argc, char** argv) +{ startTimeString = currentTimeString(); if (argc < 2) { @@ -68,10 +79,12 @@ int main(int argc, char** argv) { // Initialize CUDA and GL components init(); + Timer::initializeTimer(); // GLFW main loop mainLoop(); - + + PRNGenerator::shutdownSystem(); return 0; } @@ -122,9 +135,13 @@ void runCuda() { // Map OpenGL buffer object for writing from CUDA on a single GPU // No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer - if (iteration == 0) { + if (iteration == 0) + { pathtraceFree(); pathtraceInit(scene); + + const int NUM_RANDOM_SIZE = 1 << 20; + PRNGenerator::initializeSystem(PRNGenerator::Thrust, NUM_RANDOM_SIZE); } if (iteration < renderState->iterations) { @@ -132,9 +149,10 @@ void runCuda() { iteration++; cudaGLMapBufferObject((void**)&pbo_dptr, pbo); + Timer::resetTimer(true); // execute the kernel - int frame = 0; - pathtrace(pbo_dptr, frame, iteration); + pathtrace(pbo_dptr, sceneOptions, iteration); + Timer::printTimer(NULL, 1.0f); // unmap buffer object cudaGLUnmapBufferObject(pbo); @@ -156,6 +174,12 @@ void keyCallback(GLFWwindow* window, int key, int scancode, int action, int mods case GLFW_KEY_S: saveImage(); break; + case GLFW_KEY_M: + sceneOptions.sortPathsByMaterial = !sceneOptions.sortPathsByMaterial; + break; + case GLFW_KEY_C: + sceneOptions.iterToCacheFirstBounces = iteration + 1; + break; case GLFW_KEY_SPACE: camchanged = true; renderState = &scene->state; diff --git a/src/pathtrace - Copy.cu b/src/pathtrace - Copy.cu new file mode 100644 index 0000000..9cff7ee --- /dev/null +++ b/src/pathtrace - Copy.cu @@ -0,0 +1,420 @@ +#include +#include +#include +#include +#include +#include + +#include "sceneStructs.h" +#include "scene.h" +#include "glm/glm.hpp" +#include "glm/gtx/norm.hpp" +#include "utilities.h" +#include "pathtrace.h" +#include "intersections.h" +#include "interactions.h" + +#define ERRORCHECK 1 + +#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) +#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) +void checkCUDAErrorFn(const char *msg, const char *file, int line) { +#if ERRORCHECK + cudaDeviceSynchronize(); + cudaError_t err = cudaGetLastError(); + if (cudaSuccess == err) { + return; + } + + fprintf(stderr, "CUDA error"); + if (file) { + fprintf(stderr, " (%s:%d)", file, line); + } + fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err)); +# ifdef _WIN32 + getchar(); +# endif + exit(EXIT_FAILURE); +#endif +} + +__host__ __device__ +thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int depth) { + int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index); + return thrust::default_random_engine(h); +} + +//Kernel that writes the image to the OpenGL PBO directly. +__global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, + int iter, glm::vec3* image) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.x && y < resolution.y) { + int index = x + (y * resolution.x); + glm::vec3 pix = image[index]; + + glm::ivec3 color; + color.x = glm::clamp((int) (pix.x / iter * 255.0), 0, 255); + color.y = glm::clamp((int) (pix.y / iter * 255.0), 0, 255); + color.z = glm::clamp((int) (pix.z / iter * 255.0), 0, 255); + + // Each thread writes one pixel location in the texture (textel) + pbo[index].w = 0; + pbo[index].x = color.x; + pbo[index].y = color.y; + pbo[index].z = color.z; + } +} + +static Scene * hst_scene = NULL; +static glm::vec3 * dev_image = NULL; +static Geom * dev_geoms = NULL; +static Material * dev_materials = NULL; +static PathSegment * dev_paths = NULL; +static ShadeableIntersection * dev_intersections = NULL; +// TODO: static variables for device memory, any extra info you need, etc +// ... + +void pathtraceInit(Scene *scene) { + hst_scene = scene; + const Camera &cam = hst_scene->state.camera; + const int pixelcount = cam.resolution.x * cam.resolution.y; + + cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3)); + + cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment)); + + cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom)); + cudaMemcpy(dev_geoms, scene->geoms.data(), scene->geoms.size() * sizeof(Geom), cudaMemcpyHostToDevice); + + cudaMalloc(&dev_materials, scene->materials.size() * sizeof(Material)); + cudaMemcpy(dev_materials, scene->materials.data(), scene->materials.size() * sizeof(Material), cudaMemcpyHostToDevice); + + cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection)); + cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); + + // TODO: initialize any extra device memeory you need + + checkCUDAError("pathtraceInit"); +} + +void pathtraceFree() { + cudaFree(dev_image); // no-op if dev_image is null + cudaFree(dev_paths); + cudaFree(dev_geoms); + cudaFree(dev_materials); + cudaFree(dev_intersections); + // TODO: clean up any extra device memory you created + + checkCUDAError("pathtraceFree"); +} + +/** +* Generate PathSegments with rays from the camera through the screen into the +* scene, which is the first bounce of rays. +* +* Antialiasing - add rays for sub-pixel sampling +* motion blur - jitter rays "in time" +* lens effect - jitter ray origin positions based on a lens +*/ +__global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, PathSegment* pathSegments) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < cam.resolution.x && y < cam.resolution.y) { + int index = x + (y * cam.resolution.x); + PathSegment & segment = pathSegments[index]; + + segment.ray.origin = cam.position; + segment.color = glm::vec3(1.0f, 1.0f, 1.0f); + + // TODO: implement antialiasing by jittering the ray + segment.ray.direction = glm::normalize(cam.view + - cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f) + - cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f) + ); + + segment.pixelIndex = index; + segment.remainingBounces = traceDepth; + } +} + +// TODO: +// computeIntersections handles generating ray intersections ONLY. +// Generating new rays is handled in your shader(s). +// Feel free to modify the code below. +__global__ void computeIntersections( + int depth + , int num_paths + , PathSegment * pathSegments + , Geom * geoms + , int geoms_size + , ShadeableIntersection * intersections + ) +{ + int path_index = blockIdx.x * blockDim.x + threadIdx.x; + + if (path_index < num_paths) + { + PathSegment pathSegment = pathSegments[path_index]; + if (pathSegment.remainingBounces <= 0) + { + return; + } + float t; + glm::vec3 intersect_point; + glm::vec3 normal; + float t_min = FLT_MAX; + int hit_geom_index = -1; + bool outside = true; + + glm::vec3 tmp_intersect; + glm::vec3 tmp_normal; + + // naive parse through global geoms + + for (int i = 0; i < geoms_size; i++) + { + Geom & geom = geoms[i]; + + if (geom.type == CUBE) + { + t = boxIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); + } + else if (geom.type == SPHERE) + { + t = sphereIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); + } + // TODO: add more intersection tests here... triangle? metaball? CSG? + + // Compute the minimum t from the intersection tests to determine what + // scene geometry object was hit first. + if (t > 0.0f && t_min > t) + { + t_min = t; + hit_geom_index = i; + intersect_point = tmp_intersect; + normal = tmp_normal; + } + } + + if (hit_geom_index == -1) + { + intersections[path_index].t = -1.0f; + } + else + { + //The ray hits something + intersections[path_index].t = t_min; + intersections[path_index].materialId = geoms[hit_geom_index].materialid; + intersections[path_index].intersectionPoint = intersect_point; + intersections[path_index].surfaceNormal = normal; + } + } +} + +// LOOK: "fake" shader demonstrating what you might do with the info in +// a ShadeableIntersection, as well as how to use thrust's random number +// generator. Observe that since the thrust random number generator basically +// adds "noise" to the iteration, the image should start off noisy and get +// cleaner as more iterations are computed. +// +// TODO: +// --- Shading Stage --- +// Shade path segments based on intersections and generate new rays by +// evaluating the BSDF. +// Start off with just a big kernel that handles all the different +// materials you have in the scenefile. +__global__ void shadeScene ( + int iter + , int num_paths + , ShadeableIntersection * shadeableIntersections + , PathSegment * pathSegments + , Material * materials + ) +{ + int idx = blockIdx.x * blockDim.x + threadIdx.x; + if (idx < num_paths) + { + const ShadeableIntersection& intersection = shadeableIntersections[idx]; + PathSegment& pathSegment = pathSegments[idx]; + + if (intersection.t > 0.0f) // if the intersection exists... + { + // Set up the RNG + // LOOK: this is how you use thrust's RNG! Please look at + // makeSeededRandomEngine as well. + thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, 0); + thrust::uniform_real_distribution u01(0, 1); + + Material material = materials[intersection.materialId]; + glm::vec3 materialColor = material.color; + + // If the material indicates that the object was a light, "light" the ray + if (material.emittance > 0.0f) + { + pathSegment.color *= (materialColor * material.emittance); + pathSegment.remainingBounces = 0; // Terminate path. + } + else + { + glm::vec3 eyeV (0.f, 10.0f, 0.f); + eyeV -= intersection.intersectionPoint; + scatterRay( + pathSegments[idx], + intersection.intersectionPoint, intersection.surfaceNormal, + material, 0,0,0); + --pathSegment.remainingBounces; + } + } + else + { + // If there was no intersection, color the ray black. + // Lots of renderers use 4 channel color, RGBA, where A = alpha, often + // used for opacity, in which case they can indicate "no opacity". + // This can be useful for post-processing and image compositing. + pathSegment.color = glm::vec3(0.0f); + pathSegments[idx].remainingBounces = 0; + } + } +} + +// Add the current iteration's output to the overall image +__global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterationPaths) +{ + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (index < nPaths) + { + PathSegment iterationPath = iterationPaths[index]; + image[iterationPath.pixelIndex] += iterationPath.color; + } +} + +struct is_path_terminated +{ + __host__ __device__ + bool operator()(const PathSegment& path) + { + return path.remainingBounces > 0; + } +}; + +/** + * Wrapper for the __global__ call that sets up the kernel calls and does a ton + * of memory management + */ +void pathtrace(uchar4 *pbo, int frame, int iter) { + const int traceDepth = hst_scene->state.traceDepth; + const Camera &cam = hst_scene->state.camera; + const int pixelcount = cam.resolution.x * cam.resolution.y; + + // 2D block for generating ray from camera + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + // 1D block for path tracing + const int blockSize1d = 128; + + /////////////////////////////////////////////////////////////////////////// + + // Recap: + // * Initialize array of path rays (using rays that come out of the camera) + // * You can pass the Camera object to that kernel. + // * Each path ray must carry at minimum a (ray, color) pair, + // * where color starts as the multiplicative identity, white = (1, 1, 1). + // * This has already been done for you. + // * For each depth: + // * Compute an intersection in the scene for each path ray. + // A very naive version of this has been implemented for you, but feel + // free to add more primitives and/or a better algorithm. + // Currently, intersection distance is recorded as a parametric distance, + // t, or a "distance along the ray." t = -1.0 indicates no intersection. + // * Color is attenuated (multiplied) by reflections off of any object + // * TODO: Stream compact away all of the terminated paths. + // You may use either your implementation or `thrust::remove_if` or its + // cousins. + // * Note that you can't really use a 2D kernel launch any more - switch + // to 1D. + // * TODO: Shade the rays that intersected something or didn't bottom out. + // That is, color the ray by performing a color computation according + // to the shader, then generate a new ray to continue the ray path. + // We recommend just updating the ray's PathSegment in place. + // Note that this step may come before or after stream compaction, + // since some shaders you write may also cause a path to terminate. + // * Finally, add this iteration's results to the image. This has been done + // for you. + + // TODO: perform one iteration of path tracing + + generateRayFromCamera <<>>(cam, iter, traceDepth, dev_paths); + checkCUDAError("generate camera ray"); + + int depth = 0; + PathSegment* dev_path_end = dev_paths + pixelcount; + int num_paths = dev_path_end - dev_paths; + + // --- PathSegment Tracing Stage --- + // Shoot ray into scene, bounce between objects, push shading chunks + + bool iterationComplete = false; + while (!iterationComplete) + { + + // clean shading chunks + cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); + + // tracing + dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; + computeIntersections << > > ( + depth + , num_paths + , dev_paths + , dev_geoms + , hst_scene->geoms.size() + , dev_intersections + ); + checkCUDAError("trace one bounce"); + cudaDeviceSynchronize(); + depth++; + + // TODO: compare between directly shading the path segments and shading + // path segments that have been reshuffled to be contiguous in memory. + + shadeScene << > > ( + iter, + num_paths, + dev_intersections, + dev_paths, + dev_materials + ); + + PathSegment* end = thrust::partition(thrust::device, dev_paths, dev_paths + num_paths, is_path_terminated()); + num_paths = std::distance(dev_paths, end); + if (num_paths == 0) { + iterationComplete = true; + num_paths = std::distance(dev_paths, dev_path_end); + depth = 0; + } + } + + // Assemble this iteration and apply it to the image + dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d; + finalGather<<>>(num_paths, dev_image, dev_paths); + + /////////////////////////////////////////////////////////////////////////// + + // Send results to OpenGL buffer for rendering + sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image); + + // Retrieve image from GPU + cudaMemcpy(hst_scene->state.image.data(), dev_image, + pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToHost); + + checkCUDAError("pathtrace"); +} diff --git a/src/pathtrace.cu b/src/pathtrace.cu index c1ec122..cf840cb 100644 --- a/src/pathtrace.cu +++ b/src/pathtrace.cu @@ -2,7 +2,11 @@ #include #include #include -#include + +#include +#include +#include +#include #include #include "sceneStructs.h" @@ -13,6 +17,8 @@ #include "pathtrace.h" #include "intersections.h" #include "interactions.h" +#include "timer.h" +#include "utilities.h" #define ERRORCHECK 1 @@ -38,12 +44,6 @@ void checkCUDAErrorFn(const char *msg, const char *file, int line) { #endif } -__host__ __device__ -thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int depth) { - int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index); - return thrust::default_random_engine(h); -} - //Kernel that writes the image to the OpenGL PBO directly. __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, int iter, glm::vec3* image) { @@ -73,6 +73,7 @@ static Geom * dev_geoms = NULL; static Material * dev_materials = NULL; static PathSegment * dev_paths = NULL; static ShadeableIntersection * dev_intersections = NULL; +static ShadeableIntersection * dev_firstIntersections = NULL; // TODO: static variables for device memory, any extra info you need, etc // ... @@ -95,14 +96,20 @@ void pathtraceInit(Scene *scene) { cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection)); cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); + cudaMalloc(&dev_firstIntersections, pixelcount * sizeof(ShadeableIntersection)); + cudaMemset(dev_firstIntersections, 0, pixelcount * sizeof(ShadeableIntersection)); + // TODO: initialize any extra device memeory you need checkCUDAError("pathtraceInit"); } -void pathtraceFree() { +void pathtraceFree() +{ + checkCUDAError("pathtraceFree"); cudaFree(dev_image); // no-op if dev_image is null cudaFree(dev_paths); + checkCUDAError("pathtraceFree"); cudaFree(dev_geoms); cudaFree(dev_materials); cudaFree(dev_intersections); @@ -119,23 +126,42 @@ void pathtraceFree() { * motion blur - jitter rays "in time" * lens effect - jitter ray origin positions based on a lens */ -__global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, PathSegment* pathSegments) +__global__ void generateRayFromCamera( + Camera cam, int iter, int traceDepth, PathSegment* pathSegments, + PRNG prng) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; - if (x < cam.resolution.x && y < cam.resolution.y) { + if (x < cam.resolution.x && y < cam.resolution.y) + { int index = x + (y * cam.resolution.x); PathSegment & segment = pathSegments[index]; + segment.color = glm::vec3(1.0f, 1.0f, 1.0f); segment.ray.origin = cam.position; - segment.color = glm::vec3(1.0f, 1.0f, 1.0f); - - // TODO: implement antialiasing by jittering the ray segment.ray.direction = glm::normalize(cam.view - - cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f) - - cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f) - ); + - cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f) + - cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f) + ); + + if (cam.lensRadius > 0) // Use depth of field + { + // Set up the RNG + prng.setSeed(iter, x * y, 0); + + const float lensRdmRadius = cam.lensRadius * prng.getNextVal01(); + const float lensRdmTheta = 2 * PI * prng.getNextVal01(); + + const float focalT = cam.focalLength * cam.view.z / segment.ray.direction.z; + const glm::vec3 focusPoint = segment.ray.origin + focalT * segment.ray.direction; + + segment.ray.origin = cam.position + + cam.right * lensRdmRadius * cos(lensRdmTheta) + - cam.up * lensRdmRadius * sin(lensRdmTheta); + segment.ray.direction = glm::normalize(focusPoint - segment.ray.origin); + } + segment.pixelIndex = index; segment.remainingBounces = traceDepth; @@ -200,13 +226,17 @@ __global__ void computeIntersections( if (hit_geom_index == -1) { + pathSegment.intersectionIndex = -1; intersections[path_index].t = -1.0f; } else { + pathSegment.intersectionIndex = path_index; + //The ray hits something intersections[path_index].t = t_min; intersections[path_index].materialId = geoms[hit_geom_index].materialid; + intersections[path_index].intersectionPoint = intersect_point; intersections[path_index].surfaceNormal = normal; } } @@ -218,49 +248,60 @@ __global__ void computeIntersections( // adds "noise" to the iteration, the image should start off noisy and get // cleaner as more iterations are computed. // -// Note that this shader does NOT do a BSDF evaluation! -// Your shaders should handle that - this can allow techniques such as -// bump mapping. -__global__ void shadeFakeMaterial ( +// TODO: +// --- Shading Stage --- +// Shade path segments based on intersections and generate new rays by +// evaluating the BSDF. +// Start off with just a big kernel that handles all the different +// materials you have in the scenefile. +__global__ void shadeScene ( int iter , int num_paths , ShadeableIntersection * shadeableIntersections , PathSegment * pathSegments , Material * materials + , PRNG prng ) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < num_paths) { - ShadeableIntersection intersection = shadeableIntersections[idx]; - if (intersection.t > 0.0f) { // if the intersection exists... - // Set up the RNG - // LOOK: this is how you use thrust's RNG! Please look at - // makeSeededRandomEngine as well. - thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, 0); - thrust::uniform_real_distribution u01(0, 1); + // Set up the RNG + // LOOK: this is how you use thrust's RNG! Please look at + // makeSeededRandomEngine as well. + prng.setSeed(iter, idx, num_paths); + const ShadeableIntersection& intersection = shadeableIntersections[idx]; + PathSegment& pathSegment = pathSegments[idx]; + + if (intersection.t > 0.0f) // if the intersection exists... + { Material material = materials[intersection.materialId]; glm::vec3 materialColor = material.color; // If the material indicates that the object was a light, "light" the ray - if (material.emittance > 0.0f) { - pathSegments[idx].color *= (materialColor * material.emittance); + if (material.emittance > 0.0f) + { + pathSegment.color *= (materialColor * material.emittance); + pathSegment.remainingBounces = 0; // Terminate path. } - // Otherwise, do some pseudo-lighting computation. This is actually more - // like what you would expect from shading in a rasterizer like OpenGL. - // TODO: replace this! you should be able to start with basically a one-liner - else { - float lightTerm = glm::dot(intersection.surfaceNormal, glm::vec3(0.0f, 1.0f, 0.0f)); - pathSegments[idx].color *= (materialColor * lightTerm) * 0.3f + ((1.0f - intersection.t * 0.02f) * materialColor) * 0.7f; - pathSegments[idx].color *= u01(rng); // apply some noise because why not + else + { + scatterRay( + pathSegments[idx], + intersection.intersectionPoint, intersection.surfaceNormal, + material, prng); + --pathSegment.remainingBounces; } - // If there was no intersection, color the ray black. - // Lots of renderers use 4 channel color, RGBA, where A = alpha, often - // used for opacity, in which case they can indicate "no opacity". - // This can be useful for post-processing and image compositing. - } else { - pathSegments[idx].color = glm::vec3(0.0f); + } + else + { + // If there was no intersection, color the ray black. + // Lots of renderers use 4 channel color, RGBA, where A = alpha, often + // used for opacity, in which case they can indicate "no opacity". + // This can be useful for post-processing and image compositing. + pathSegment.color = glm::vec3(0.0f); + pathSegments[idx].remainingBounces = 0; } } } @@ -277,11 +318,46 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati } } +struct IsPathValid +{ + __host__ __device__ + bool operator()(const PathSegment& path) + { + return path.remainingBounces > 0; + } +}; + +struct SortByMaterialId +{ +public: + + SortByMaterialId(ShadeableIntersection* intersections) : dev_intersections(intersections) + {} + + __host__ __device__ + bool operator()(const PathSegment& pathLhs, const PathSegment& pathRhs) + { + return operator()(dev_intersections[pathLhs.intersectionIndex], dev_intersections[pathRhs.intersectionIndex]); + } + + __host__ __device__ + bool operator()(const ShadeableIntersection& intersectLhs, const ShadeableIntersection& intersectRhs) + { + return intersectLhs.materialId < intersectRhs.materialId; + } + +public: + + ShadeableIntersection* dev_intersections; +}; + + /** * Wrapper for the __global__ call that sets up the kernel calls and does a ton * of memory management */ -void pathtrace(uchar4 *pbo, int frame, int iter) { +void pathtrace(uchar4 *pbo, const SceneOptions& sceneOptions, int iter) +{ const int traceDepth = hst_scene->state.traceDepth; const Camera &cam = hst_scene->state.camera; const int pixelcount = cam.resolution.x * cam.resolution.y; @@ -325,57 +401,85 @@ void pathtrace(uchar4 *pbo, int frame, int iter) { // for you. // TODO: perform one iteration of path tracing - - generateRayFromCamera <<>>(cam, iter, traceDepth, dev_paths); + checkCUDAError("generate camera ray"); + PRNG prng(PRNGenerator::getType(), PRNGenerator::getNumVals(), PRNGenerator::getVals()); + checkCUDAError("generate camera ray"); + generateRayFromCamera << > >(cam, iter, traceDepth, dev_paths, prng); checkCUDAError("generate camera ray"); int depth = 0; PathSegment* dev_path_end = dev_paths + pixelcount; int num_paths = dev_path_end - dev_paths; + Timer::playTimer(); // --- PathSegment Tracing Stage --- // Shoot ray into scene, bounce between objects, push shading chunks + bool iterationComplete = false; + while (!iterationComplete) + { - bool iterationComplete = false; - while (!iterationComplete) { - - // clean shading chunks - cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); - - // tracing - dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; - computeIntersections <<>> ( - depth - , num_paths - , dev_paths - , dev_geoms - , hst_scene->geoms.size() - , dev_intersections - ); - checkCUDAError("trace one bounce"); - cudaDeviceSynchronize(); - depth++; - - - // TODO: - // --- Shading Stage --- - // Shade path segments based on intersections and generate new rays by - // evaluating the BSDF. - // Start off with just a big kernel that handles all the different - // materials you have in the scenefile. - // TODO: compare between directly shading the path segments and shading - // path segments that have been reshuffled to be contiguous in memory. - - shadeFakeMaterial<<>> ( - iter, - num_paths, - dev_intersections, - dev_paths, - dev_materials - ); - iterationComplete = true; // TODO: should be based off stream compaction results. - } + // clean shading chunks + cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); + dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; + if (iter <= sceneOptions.iterToCacheFirstBounces || depth != 0) + { + // tracing + computeIntersections << > > ( + depth + , num_paths + , dev_paths + , dev_geoms + , hst_scene->geoms.size() + , dev_intersections + ); + checkCUDAError("trace one bounce"); + cudaDeviceSynchronize(); + + if (iter == sceneOptions.iterToCacheFirstBounces && depth == 0) + { + cudaMemcpy(dev_firstIntersections, dev_intersections, pixelcount * sizeof(ShadeableIntersection), cudaMemcpyDeviceToDevice); + } + } + else // iter > sceneOptions.iterToCacheFirstBounces && depth == 0 + { + cudaMemcpy(dev_intersections, dev_firstIntersections, pixelcount * sizeof(ShadeableIntersection), cudaMemcpyDeviceToDevice); + } + + // TODO: compare between directly shading the path segments and shading + // path segments that have been reshuffled to be contiguous in memory. + if (sceneOptions.sortPathsByMaterial) + { + thrust::device_vector dv_in_paths(dev_paths, dev_paths + num_paths); + thrust::stable_sort(dv_in_paths.begin(), dv_in_paths.end(), SortByMaterialId(dev_intersections)); + + thrust::device_vector dv_in_intersects(dev_intersections, dev_intersections + num_paths); + thrust::stable_sort(dv_in_intersects.begin(), dv_in_intersects.end(), SortByMaterialId(dev_intersections)); + } + + shadeScene << > > ( + iter, + num_paths, + dev_intersections, + dev_paths, + dev_materials, + prng + ); + + PathSegment* end = thrust::partition(thrust::device, dev_paths, dev_paths + num_paths, IsPathValid()); + num_paths = std::distance(dev_paths, end); + if (num_paths == 0) + { + iterationComplete = true; + num_paths = std::distance(dev_paths, dev_path_end); + } + depth++; + if (depth >= traceDepth) + { + iterationComplete = true; + } + } + Timer::pauseTimer(); // Assemble this iteration and apply it to the image dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d; finalGather<<>>(num_paths, dev_image, dev_paths); diff --git a/src/pathtrace.h b/src/pathtrace.h index 1241227..7b4b0ce 100644 --- a/src/pathtrace.h +++ b/src/pathtrace.h @@ -5,4 +5,4 @@ void pathtraceInit(Scene *scene); void pathtraceFree(); -void pathtrace(uchar4 *pbo, int frame, int iteration); +void pathtrace(uchar4 *pbo, const SceneOptions& sceneOptions, int iter); diff --git a/src/scene.cpp b/src/scene.cpp index cbae043..70be87e 100644 --- a/src/scene.cpp +++ b/src/scene.cpp @@ -96,7 +96,7 @@ int Scene::loadCamera() { float fovy; //load static properties - for (int i = 0; i < 5; i++) { + for (int i = 0; i < 10; i++) { string line; utilityCore::safeGetline(fp_in, line); vector tokens = utilityCore::tokenizeString(line); @@ -111,21 +111,23 @@ int Scene::loadCamera() { state.traceDepth = atoi(tokens[1].c_str()); } else if (strcmp(tokens[0].c_str(), "FILE") == 0) { state.imageName = tokens[1]; - } + } else if (strcmp(tokens[0].c_str(), "EYE") == 0) { + camera.position = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())); + } else if (strcmp(tokens[0].c_str(), "LOOKAT") == 0) { + camera.lookAt = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())); + } else if (strcmp(tokens[0].c_str(), "UP") == 0) { + camera.up = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())); + } else if (strcmp(tokens[0].c_str(), "FOCAL_LENGTH") == 0) { + camera.focalLength = atof(tokens[1].c_str()); + } else if (strcmp(tokens[0].c_str(), "LENS_RADIUS") == 0) { + camera.lensRadius = atof(tokens[1].c_str()); + } } string line; utilityCore::safeGetline(fp_in, line); while (!line.empty() && fp_in.good()) { vector tokens = utilityCore::tokenizeString(line); - if (strcmp(tokens[0].c_str(), "EYE") == 0) { - camera.position = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())); - } else if (strcmp(tokens[0].c_str(), "LOOKAT") == 0) { - camera.lookAt = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())); - } else if (strcmp(tokens[0].c_str(), "UP") == 0) { - camera.up = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())); - } - utilityCore::safeGetline(fp_in, line); } diff --git a/src/sceneStructs.h b/src/sceneStructs.h index b38b820..6780c90 100644 --- a/src/sceneStructs.h +++ b/src/sceneStructs.h @@ -49,9 +49,12 @@ struct Camera { glm::vec3 right; glm::vec2 fov; glm::vec2 pixelLength; + float focalLength; + float lensRadius; }; -struct RenderState { +struct RenderState +{ Camera camera; unsigned int iterations; int traceDepth; @@ -59,18 +62,30 @@ struct RenderState { std::string imageName; }; -struct PathSegment { +struct SceneOptions +{ + SceneOptions() : sortPathsByMaterial(true), iterToCacheFirstBounces(1) + {} + bool sortPathsByMaterial; + bool iterToCacheFirstBounces; +}; + +struct PathSegment +{ Ray ray; glm::vec3 color; int pixelIndex; int remainingBounces; + int intersectionIndex; }; // Use with a corresponding PathSegment to do: // 1) color contribution computation // 2) BSDF evaluation: generate a new ray -struct ShadeableIntersection { - float t; +struct ShadeableIntersection +{ + glm::vec3 intersectionPoint; glm::vec3 surfaceNormal; int materialId; + float t; }; diff --git a/src/timer.cu b/src/timer.cu new file mode 100644 index 0000000..9c3ef44 --- /dev/null +++ b/src/timer.cu @@ -0,0 +1,462 @@ + +#include +#include + +#include "timer.h" + +#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) +#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) +static void checkCUDAErrorFn(const char *msg, const char *file, int line) { +#if ERRORCHECK + cudaDeviceSynchronize(); + cudaError_t err = cudaGetLastError(); + if (cudaSuccess == err) { + return; + } + + fprintf(stderr, "CUDA error"); + if (file) { + fprintf(stderr, " (%s:%d)", file, line); + } + fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err)); +# ifdef _WIN32 + getchar(); +# endif + exit(EXIT_FAILURE); +#endif +} + +MyTimer* Timer::m_myTimer = NULL; + +class MyTimer +{ +public: + using Clock = std::chrono::high_resolution_clock; + using TimePoint = std::chrono::time_point; + +public: + MyTimer() + { + m_refCount = 0; + m_useGPU = true; + m_elapsedTimeInms = 0.0f; + + cudaEventCreate(&m_start); + cudaEventCreate(&m_stop); + + m_startTime = Clock::now(); + m_stopTime = Clock::now(); + } + + ~MyTimer() + { + cudaEventDestroy(m_start); + cudaEventDestroy(m_stop); + } + +public: + + void resetTimer(bool useGPU = true) + { + m_useGPU = useGPU; + m_elapsedTimeInms = 0.0f; + } + + void playTimer() + { + if (m_refCount++ == 0) + { + if (m_useGPU) + { + cudaEventRecord(m_start); + } + else + { + m_startTime = Clock::now(); + } + } + } + + bool pauseTimer() + { + bool bPaused = false; + if (--m_refCount == 0) + { + float newElapsedTime = 0.0f; + if (m_useGPU) + { + cudaEventRecord(m_stop); + cudaEventSynchronize(m_stop); + cudaEventElapsedTime(&newElapsedTime, m_start, m_stop); + } + else + { + m_stopTime = Clock::now(); + newElapsedTime = std::chrono::duration_cast(m_stopTime - m_startTime).count(); + } + m_elapsedTimeInms += newElapsedTime; + bPaused = true; + } + return bPaused; + } + + float printTimer(const char* timerHeader, float timerFactor) + { + float elapsedTime = timerFactor * m_elapsedTimeInms; + printf("%s - Elapsed Time:%f ms.\n", timerHeader, elapsedTime); + return elapsedTime; + } + +private: + size_t m_refCount; + bool m_useGPU; + float m_elapsedTimeInms; +private: + cudaEvent_t m_start; + cudaEvent_t m_stop; + +private: + TimePoint m_startTime; + TimePoint m_stopTime; +}; + +Timer::Timer() +{ +} + +Timer::~Timer() +{ +} + +void Timer::initializeTimer() +{ + if (m_myTimer == NULL) + m_myTimer = new MyTimer; +} + +void Timer::shutdownTimer() +{ + if (m_myTimer != NULL) + delete m_myTimer; +} + +void Timer::resetTimer(bool useGPU) +{ + m_myTimer->resetTimer(useGPU); +} + +void Timer::playTimer() +{ + m_myTimer->playTimer(); +} + +void Timer::pauseTimer() +{ + m_myTimer->pauseTimer(); +} + +void Timer::printTimer(const char* timerHeader, float timerFactor) +{ + m_myTimer->printTimer(timerHeader, timerFactor); +} + + +//------------------------------- +//-------------RNG TESTS--------- +//------------------------------- + +void xOrBinValues(const char* x, const char* y, char* z) +{ + int bitXIdx = 0; int bitYIdx = 0; int bitZIdx = 0; + while (x[bitXIdx] != NULL || y[bitYIdx] != NULL) + { + int xBit = 0; + if (x[bitXIdx] != NULL) + { + xBit = (x[bitXIdx] == '1') ? 1 : 0; + bitXIdx++; + } + + int yBit = 0; + if (y[bitYIdx] != NULL) + { + yBit = (y[bitYIdx] == '1') ? 1 : 0; + bitYIdx++; + } + z[bitZIdx++] = ((xBit ^ yBit) == 0) ? '0' : '1'; + } + z[bitZIdx] = NULL; +} + +int indexOfRightMostZeroBit(int bitMask) +{ + int bitIdx = 1; + while (bitMask) + { + if ((bitMask & 0x1) == 0) break; + bitMask >>= 1; + ++bitIdx; + } + return bitIdx; +} + +int computeViInBinary(int mi, int i, char* outVi) +{ + outVi[i] = NULL; + for (int iter = i - 1; iter >= 0; --iter, mi >>= 1) + { + char nextBin = (mi & 0x1 == 1) ? '1' : '0'; + outVi[iter] = nextBin; + } + return i; +} + +float convertBinary01ToFloat(const char* bin) +{ + float val = 0; + int idx = 0; + while (bin[idx] != NULL) + { + int bit = (bin[idx] == '0') ? 0 : 1; + val += bit * powf(2, -(idx + 1)); + idx++; + } + return val; +} + +void preComputeMiTable(const int numMi, int* miBuffer) +{ + // From ALGORITHM 659 Implementing Sobol’s Quasirandom Sequence Generator. + // Each mi is odd and mi < (1< u01(0, 1); + + const int gridIdx = iter % GridSize; + const float i01Val = u01(rng); + + ++iter; + return gridIdx / (1.0f * (GridSize - 1)) + i01Val; +} +///////////////////////////////////////////////////////////////////////////// +#include +#include + +//PRNGenerator::eType PRNGenerator::m_type = PRNGenerator::None; +//static size_t m_numVals = 0; +//static float* m_vals = NULL; + +PRNGenerator::eType PRNGenerator::m_type = PRNGenerator::None; +size_t PRNGenerator::m_numVals = 0; +float* PRNGenerator::m_vals = NULL; + +void PRNGenerator::initializeSystem(eType prngType, int numVals) +{ + if (prngType == PRNGenerator::None || m_vals != NULL) + { + fprintf(stderr, "ERROR"); + return; + } + checkCUDAError("generate camera ray"); + m_type = prngType; + m_numVals = numVals; + + cudaMalloc(&m_vals, numVals * sizeof(float)); + float* vals = new float[numVals]; memset(vals, 0, numVals* sizeof(float)); + + checkCUDAError("generate camera ray"); + switch (m_type) + { + case PRNGenerator::Thrust: + break; + case PRNGenerator::Stratified: + for (int i = 0; i < numVals; ++i) + { + vals[i] = generateNextStratifiedRNG(); + } + break; + case PRNGenerator::Halton: + for (int i = 0; i < numVals; ++i) + { + vals[i] = generateNextHaltonRNG(); + } + break; + case PRNGenerator::Sodel: + for (int i = 0; i < numVals; ++i) + { + checkCUDAError("generate camera ray"); + vals[i] = generateNextSodelRNG(); + } + break; + default: + break; + } + + cudaMemcpy(m_vals, vals, numVals * sizeof(float), cudaMemcpyKind::cudaMemcpyHostToDevice); + delete [] vals; + checkCUDAError("generate camera ray"); +} + +void PRNGenerator::shutdownSystem() +{ + cudaFree(m_vals); +} + +PRNG::PRNG(PRNGenerator::eType prngType, int numVals, float* vals) +: m_type(prngType), + m_numVals(numVals), m_vals(vals) +{ + +} + +__host__ __device__ void PRNG::setSeed(int iter, int index, int depth) +{ + const int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index); + switch (m_type) + { + case PRNGenerator::Thrust: + m_rng = thrust::default_random_engine(h); + break; + case PRNGenerator::Stratified: + case PRNGenerator::Halton: + case PRNGenerator::Sodel: + { + const size_t absH = abs(h); + m_nextRandomIdx = absH % m_numVals; + break; + } + default: + break; + } +} + +__host__ __device__ PRNG::~PRNG() +{ +} + +__host__ __device__ float PRNG::getNextVal01() +{ + float nextVal = 0.0f; + switch (m_type) + { + case PRNGenerator::Thrust: + nextVal = m_u01(m_rng); + break; + case PRNGenerator::Stratified: + case PRNGenerator::Halton: + case PRNGenerator::Sodel: + { + nextVal = m_vals[m_nextRandomIdx++]; + if (m_nextRandomIdx >= m_numVals) + { + m_nextRandomIdx = 0; + } + } + break; + default: + break; + } + return nextVal; +} diff --git a/src/timer.h b/src/timer.h new file mode 100644 index 0000000..013d4b9 --- /dev/null +++ b/src/timer.h @@ -0,0 +1,81 @@ +#pragma once + +class MyTimer; + +class Timer +{ +public: + Timer(); + ~Timer(); + +public: + static void initializeTimer(); + static void shutdownTimer(); + +public: + static void resetTimer(bool useGPU = true); + static void playTimer(); + static void pauseTimer(); + static void printTimer(const char* timerHeader, float timerFactor); + +private: + static MyTimer* m_myTimer; +}; + +/** +* Handy-dandy hash function that provides seeds for random number generation. +*/ +__host__ __device__ inline unsigned int utilhash(unsigned int a) { + a = (a + 0x7ed55d16) + (a << 12); + a = (a ^ 0xc761c23c) ^ (a >> 19); + a = (a + 0x165667b1) + (a << 5); + a = (a + 0xd3a2646c) ^ (a << 9); + a = (a + 0xfd7046c5) + (a << 3); + a = (a ^ 0xb55a4f09) ^ (a >> 16); + return a; +} + +#include +#include + +struct PRNGenerator +{ + enum eType + { + None, + Thrust, + Stratified, + Halton, + Sodel, + }; + + static void initializeSystem(eType prngType, int numVals); + static void shutdownSystem(void); + + static eType getType() { return m_type; }; + static size_t getNumVals() { return m_numVals; }; + static float* getVals() { return m_vals; }; + +private: + + static eType m_type; + static size_t m_numVals; + static float* m_vals; +}; + +struct PRNG +{ + __host__ __device__ PRNG(PRNGenerator::eType prngType, int numVals, float* vals); + __host__ __device__ ~PRNG(); + + __host__ __device__ void setSeed(int iter, int index, int depth); + __host__ __device__ float getNextVal01(); + + PRNGenerator::eType m_type; + size_t m_numVals; + float* m_vals; + + size_t m_nextRandomIdx; + thrust::default_random_engine m_rng; + thrust::uniform_real_distribution m_u01; +}; diff --git a/src/utilities.cpp b/src/utilities.cpp index 9c06c68..8dea8f0 100644 --- a/src/utilities.cpp +++ b/src/utilities.cpp @@ -109,4 +109,4 @@ std::istream& utilityCore::safeGetline(std::istream& is, std::string& t) { t += (char)c; } } -} +} \ No newline at end of file diff --git a/src/utilities.h b/src/utilities.h index abb4f27..67a0959 100644 --- a/src/utilities.h +++ b/src/utilities.h @@ -23,4 +23,4 @@ namespace utilityCore { extern glm::mat4 buildTransformationMatrix(glm::vec3 translation, glm::vec3 rotation, glm::vec3 scale); extern std::string convertIntToString(int number); extern std::istream& safeGetline(std::istream& is, std::string& t); //Thanks to http://stackoverflow.com/a/6089413 -} +} \ No newline at end of file