diff --git a/README.md b/README.md
index 110697c..a1db9c7 100644
--- a/README.md
+++ b/README.md
@@ -3,11 +3,123 @@ CUDA Path Tracer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Xiaomao Ding
+* Tested on: Windows 8.1, i7-4700MQ @ 2.40GHz 8.00GB, GT 750M 2047MB (Personal Computer)
 
-### (TODO: Your README)
+<img src="https://raw.githubusercontent.com/xnieamo/Project3-CUDA-Path-Tracer/master/img/cornell.2016-10-11_05-34-19z.5000samp.png" width="425"/> <img src="https://raw.githubusercontent.com/xnieamo/Project3-CUDA-Path-Tracer/master/img/cornell.2016-10-11_06-15-00z.5000samp.png" width="425"/> 
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+# Introduction
+The code in this repository implements a CUDA-based Monte Carlo path tracer allowing us to quickly render globally illuminated images. The path tracer sends out many rays into the scene to "sample" the colors of the objects in the scene. Upon hitting an object, another ray is generated at that location. This allows the path tracer to accumulate color of light bouncing off of nearby surfaces as well. Because each new ray generated is sampled from a probability distribution described by the object's material, it takes many iterations of the algorithm before a uniform "non-noisy" image emerges.
 
+## Features
+The following features have been implemented in this project:
+* Direct illumination with Multiple Importance Sampling
+* Depth of field via camera jittering
+* Stochastic Sample Anti-aliasing
+* Realistic reflective and refractive materials via Fresnel dielectrics and conductors
+* Path termination via stream compaction
+* First bounce caching
+
+## Code guide
+Features are enabled/disable via defines at the top of `interactions.h` and `pathtrace.cu`. MIS and Fresnel effects are toggled at the top of `interactions.h`. All other features are located at the top of `pathtrace.cu`.
+
+#### Controls
+
+* Esc to save an image and exit.
+* S to save an image. Watch the console for the output filename.
+* Space to re-center the camera at the original scene lookAt point
+* left mouse button to rotate the camera
+* right mouse button on the vertical axis to zoom in/out
+* middle mouse button to move the LOOKAT point in the scene's X/Z plane
+
+# Performance Analysis
+## Direct illumination
+The idea of direct illumination is to directly sample the light at each bounce by shooting and evaluating an additional ray. This in theory allows us to converge to a stable image much quicker than waiting for the rays to randomly hit light.  In this project, the base global illumination renderer multiplicatively stacks the color of each surface hit. However, in the direct illumination renderer, I use a more realistic implementation of the light transport equation, found in the Physically Based Rendering Textbook [PBRT]. Thus, the images generated by the two algorithms will look different. For this section, it is more important to just compare how "grainy" the images are. Below are three sets of images, rendered with 10, 100, and 500 iterations, using the base renderer as well as the direct illumination renderer.
+
+Base Renderer              |  Direct Illumination
+:-------------------------:|:-------------------------:
+![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_20-39-59z.10samp.png?raw=true)  |  ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-38-05z.10samp.png?raw=true)
+![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_20-50-11z.100samp.png?raw=true)  |  ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-36-46z.100samp.png?raw=true)
+![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_21-26-35z.500samp.png?raw=true)  |  ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png?raw=true)
+
+It is clear that the direct illumination renderer gives a much nicer image at the same number of iterations. However, the increased renderering quality comes with a performance tradeoff. With the direct renderer, we shoot a ray to the light at each bounce, requiring an additional intersection test. The above image in particular was also rendered with multiple importance sampling, which shoots another ray from the intersection. This totals to 2 additional rays for each bounce. The plot below shows how much slower the direct illumination renderer is. All values in this plot as well as additional plots later on are averaged over 10 iterations of the renderer.
+
+<p align="center">
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/BasicRenderer-DL-Time.png?raw=true">
+</p>
+
+Shockingly, the time taken for the actual path tracing is now over 10 times greater than the basic renderer! Overall, this the direct illumination renderer takes about 3 times longer per iteration, so we could have run the basic renderer for that many more iterations giving the same time. Below, I ran the basic renderer for 1500 iterations, compared to a 500 iteration image from the direct illumination renderer. Even with 1000 extra iterations, the direct illumination renderer looks a bit better.
+
+Base Renderer 1500 iterations    |  Direct Illumination 500 iterations
+:-------------------------:|:-------------------------:
+![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_21-52-13z.1500samp.png?raw=true)  |  ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png?raw=true)
+
+## Path termination via stream compaction
+The direct illumination's runtime is fairly undesireable. Luckily, several of the other features in this project allow us drastically reduce the runtime. The first feature is path termination. The idea is to remove paths that have terminated (hit a light source or finished all their bounces) each time before we execute the path tracing kernel. In theory, this reduces the number of threads that need to be executed as the rays are evaluated. We use`thrust::partition` to perform stream compaction on the rays that have terminated. Here is a plot of the number of rays remaining, as well as the executition time of a single kernel call against the bounce number.
+<p align="center">
+<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/SCRayCount.png?raw=true"/> 
+<br><br>
+<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/SCRuntime.png?raw=true"/> 
+</p>
+
+<p align="center">
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/WithStreamCompaction.png?raw=true">
+</p>
+The plots show that the number of rays that need to be executed as the algorithm progress drops tremendously! While the first two stream compaction and path trace call take longer than without, all bounces from 3 and on take much less time than if we were to run the algorithm without stream compaction.  By reducing the number of rays as we continue, we save a lot of time. In fact, we nearly half our runtime (consider that the number of intersections calculated initially is also reduced).
+
+## First bounce caching
+Because we are sampling for many iterations, we can also cache the rays first cast from the camera. This is because the camera will always be in the same position when we render. This saves us time by removing the need to cast the initial set of rays on each iteration.  This naturally saves time as a function of the number of iterations we are running. Below is a plot of the time saved by caching the first rays for 10 iterations of the algorithm.
+
+<p align="center">
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/FirstBounceCache.png?raw=true">
+</p>
+
+We save roughly 3 milliseconds per iteration when we cache the first bounces. For a 5000 iteration rendering, this amounts to roughly 15000 milliseconds save or about 43 (15000/350) additional iterations.  This seems like a fairly insignificant boost in performance, especially because caching the first ray cast will prevent us from performing anti-aliasing.
+
+## Anti-aliasing
+If we only cast our initial ray from a single location, aliasing can occur. This is when there may be multiple objects in a single pixel but only one gets sampled. The result is that lines and boundaries in the image may appear jagged.  To address this issue, we jitter our intial ray cast slightly within the pixel. This stochastic jittering let's us sample every object that may be in a pixel since our initial ray is no longer fixed. As just mentioned in the previous section, the we can no longer cache our first bounce.  One possible solution would be to cache a large amount of initial casts. However, if we do so, we always have a higher risk of aliasing! Personally, I think the first bounce caching doesn't provide enough of a benefit to make pre-allocating a large amount of space to store so many more rays worthwhile. Below are two images, one anti-aliased, one not. Notice how the edges of the box and the border of the spheres are jagged in the non-aliased version.
+
+No Anti-alias              | With Anti-alias
+:-------------------------:|:-------------------------:
+![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/noaa.2016-10-11_23-58-49z.500samp.png?raw=true)  |  ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png?raw=true)
+
+Because anti-aliasing is something that occurs at the start of each iteration, it is essentially free! (minus 2 random number generations). Unless we were caching our initial bounces, there are no performance tradeoffs for this feature.
+
+
+## Depth of field
+Depth of field refers to the blurring affect on objects not within the same plane of focus. We can mimic the effect of depth of field by slightly jittering our inital ray cast through an imaginary lens. The image below shows the result. Notice how the objects become less blurry the further away the are from the camera. This occurs if our focus is far away.
+
+<p align="center">
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/depth.2016-10-11_22-51-27z.5000samp.png?raw=true">
+</p>
+
+
+Similar to anti-aliasing, there are no performance costs for this feature because it is a ray pre-processing step.
+
+## Reflection/Refraction
+We make use of two Fresnel equations to implement reflective and refractive materials. Note that the Fresnel implementations are only available in the direct illumination renderer. The basic renderer just assumes perfectly specular surfaces and uses Snell's law to calculate refracted rays. For reflections, we assume the material to be a Fresnel conductor. For refractions, we assume a Fresnel dielecctric.  The formula for the two materials are as follows:
+<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/equations/Cond.PNG?raw=true" width="425"/> <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/equations/Diel.PNG?raw=true" width="350"/> 
+
+We use an approximation of the two equations [PBRT 8.2-8.3] in our implementations. Additionally, the Fresnel equations are relatively cheap. The runtime assuming a perfect specular material is only 3.5 milliseconds faster per iteration. This means that the Fresnel materials are essentially free!
+
+# Other small things
+## Malley's Method
+For diffuse materials, the brdf is Lambertian. This means that rays sampled off the bounce come from a uniform distribution over a hemisphere. Cosine sampling is a way to get a uniform distribution over a hemishpere. However, this requires several cosine/sine calls which are very costly on the GPU, since they depend on a limited number of special computation units. Malley's method is a way to approximate a cosine distribution. The great thing about this approximation is that it requires NO cosine or sines. Instead, we only need two random number generations. This means that using this method avoids the bottleneck on the GPU! Below is a plot showing how Malley's method can improve performance. One thing to note is that the time for intersections has decreased slightly as well. This is possible because the approximation is less accurate and sends more stray rays, but I am not certain what the root cause is.
+
+<p align="center">
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/MalleysApprox.png?raw=true">
+</p>
+
+## Odd performance boost without MIS
+For some reason, taking out MIS from the code cuts the path tracing kernel runtime nearly in half! I have no idea why this happens. This is surprising because commenting out that section of the MIS code does not remove half the calculations (at least not in code). It is true though that 1 of the intersections is no longer used when MIS is disabled, so perhaps the compiler has also removed the intersection test call?
+
+<p align="center">
+  With MIS.
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/profile/MISProfile.PNG?raw=true">
+  <br><br>
+  Without MIS.
+  <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/profile/noMISProfile.PNG?raw=true">
+</p>
+
+# Reference
+[PBRT] Physically Based Rendering, Second Edition: From Theory To Implementation. Pharr, Matt and Humphreys, Greg. 2010.
diff --git a/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png b/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png
new file mode 100644
index 0000000..d8565f1
Binary files /dev/null and b/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png differ
diff --git a/img/MIS-Comparisons/MIS.2016-10-11_20-36-46z.100samp.png b/img/MIS-Comparisons/MIS.2016-10-11_20-36-46z.100samp.png
new file mode 100644
index 0000000..5af6ddf
Binary files /dev/null and b/img/MIS-Comparisons/MIS.2016-10-11_20-36-46z.100samp.png differ
diff --git a/img/MIS-Comparisons/MIS.2016-10-11_20-38-05z.10samp.png b/img/MIS-Comparisons/MIS.2016-10-11_20-38-05z.10samp.png
new file mode 100644
index 0000000..d074250
Binary files /dev/null and b/img/MIS-Comparisons/MIS.2016-10-11_20-38-05z.10samp.png differ
diff --git a/img/MIS-Comparisons/basic.2016-10-11_20-39-59z.10samp.png b/img/MIS-Comparisons/basic.2016-10-11_20-39-59z.10samp.png
new file mode 100644
index 0000000..da8455e
Binary files /dev/null and b/img/MIS-Comparisons/basic.2016-10-11_20-39-59z.10samp.png differ
diff --git a/img/MIS-Comparisons/basic.2016-10-11_20-50-11z.100samp.png b/img/MIS-Comparisons/basic.2016-10-11_20-50-11z.100samp.png
new file mode 100644
index 0000000..2d708fa
Binary files /dev/null and b/img/MIS-Comparisons/basic.2016-10-11_20-50-11z.100samp.png differ
diff --git a/img/MIS-Comparisons/basic.2016-10-11_21-26-35z.500samp.png b/img/MIS-Comparisons/basic.2016-10-11_21-26-35z.500samp.png
new file mode 100644
index 0000000..c951a42
Binary files /dev/null and b/img/MIS-Comparisons/basic.2016-10-11_21-26-35z.500samp.png differ
diff --git a/img/MIS-Comparisons/basic.2016-10-11_21-52-13z.1500samp.png b/img/MIS-Comparisons/basic.2016-10-11_21-52-13z.1500samp.png
new file mode 100644
index 0000000..7a45e42
Binary files /dev/null and b/img/MIS-Comparisons/basic.2016-10-11_21-52-13z.1500samp.png differ
diff --git a/img/MIS-Comparisons/no_MIS.2016-10-11_20-27-52z.100samp.png b/img/MIS-Comparisons/no_MIS.2016-10-11_20-27-52z.100samp.png
new file mode 100644
index 0000000..3bb17c7
Binary files /dev/null and b/img/MIS-Comparisons/no_MIS.2016-10-11_20-27-52z.100samp.png differ
diff --git a/img/MIS-Comparisons/no_MIS.2016-10-11_20-28-41z.500samp.png b/img/MIS-Comparisons/no_MIS.2016-10-11_20-28-41z.500samp.png
new file mode 100644
index 0000000..d141258
Binary files /dev/null and b/img/MIS-Comparisons/no_MIS.2016-10-11_20-28-41z.500samp.png differ
diff --git a/img/MIS-Comparisons/no_MIS.2016-10-11_21-24-32z.10samp.png b/img/MIS-Comparisons/no_MIS.2016-10-11_21-24-32z.10samp.png
new file mode 100644
index 0000000..0e7a3cd
Binary files /dev/null and b/img/MIS-Comparisons/no_MIS.2016-10-11_21-24-32z.10samp.png differ
diff --git a/img/cornell.2016-10-11_05-34-19z.5000samp.png b/img/cornell.2016-10-11_05-34-19z.5000samp.png
new file mode 100644
index 0000000..5502a6d
Binary files /dev/null and b/img/cornell.2016-10-11_05-34-19z.5000samp.png differ
diff --git a/img/cornell.2016-10-11_06-15-00z.5000samp.png b/img/cornell.2016-10-11_06-15-00z.5000samp.png
new file mode 100644
index 0000000..df7e898
Binary files /dev/null and b/img/cornell.2016-10-11_06-15-00z.5000samp.png differ
diff --git a/img/depth.2016-10-11_22-51-27z.5000samp.png b/img/depth.2016-10-11_22-51-27z.5000samp.png
new file mode 100644
index 0000000..07ec50a
Binary files /dev/null and b/img/depth.2016-10-11_22-51-27z.5000samp.png differ
diff --git a/img/equations/Cond.PNG b/img/equations/Cond.PNG
new file mode 100644
index 0000000..9e2f938
Binary files /dev/null and b/img/equations/Cond.PNG differ
diff --git a/img/equations/Diel.PNG b/img/equations/Diel.PNG
new file mode 100644
index 0000000..83a412e
Binary files /dev/null and b/img/equations/Diel.PNG differ
diff --git a/img/noaa.2016-10-11_23-58-49z.500samp.png b/img/noaa.2016-10-11_23-58-49z.500samp.png
new file mode 100644
index 0000000..6cd9c85
Binary files /dev/null and b/img/noaa.2016-10-11_23-58-49z.500samp.png differ
diff --git a/img/plots/BasicRenderer-DL-Time.png b/img/plots/BasicRenderer-DL-Time.png
new file mode 100644
index 0000000..e2b46b2
Binary files /dev/null and b/img/plots/BasicRenderer-DL-Time.png differ
diff --git a/img/plots/FirstBounceCache.png b/img/plots/FirstBounceCache.png
new file mode 100644
index 0000000..5f5ab81
Binary files /dev/null and b/img/plots/FirstBounceCache.png differ
diff --git a/img/plots/MalleysApprox.png b/img/plots/MalleysApprox.png
new file mode 100644
index 0000000..82d026c
Binary files /dev/null and b/img/plots/MalleysApprox.png differ
diff --git a/img/plots/SCRayCount.png b/img/plots/SCRayCount.png
new file mode 100644
index 0000000..468bf69
Binary files /dev/null and b/img/plots/SCRayCount.png differ
diff --git a/img/plots/SCRuntime.png b/img/plots/SCRuntime.png
new file mode 100644
index 0000000..a39ebba
Binary files /dev/null and b/img/plots/SCRuntime.png differ
diff --git a/img/plots/WithStreamCompaction.png b/img/plots/WithStreamCompaction.png
new file mode 100644
index 0000000..6d9be15
Binary files /dev/null and b/img/plots/WithStreamCompaction.png differ
diff --git a/img/profile/MISProfile.PNG b/img/profile/MISProfile.PNG
new file mode 100644
index 0000000..f42454b
Binary files /dev/null and b/img/profile/MISProfile.PNG differ
diff --git a/img/profile/noMISProfile.PNG b/img/profile/noMISProfile.PNG
new file mode 100644
index 0000000..2a634bd
Binary files /dev/null and b/img/profile/noMISProfile.PNG differ
diff --git a/scenes/cornell.txt b/scenes/cornell.txt
index 83ff820..ad21e46 100644
--- a/scenes/cornell.txt
+++ b/scenes/cornell.txt
@@ -12,7 +12,7 @@ EMITTANCE   5
 MATERIAL 1
 RGB         .98 .98 .98
 SPECEX      0
-SPECRGB     0 0 0
+SPECRGB     1 1 1
 REFL        0
 REFR        0
 REFRIOR     0
@@ -43,30 +43,43 @@ MATERIAL 4
 RGB         .98 .98 .98
 SPECEX      0
 SPECRGB     .98 .98 .98
-REFL        1
+REFL        0
+REFR        1
+REFRIOR     1.5
+EMITTANCE   0
+ABSORPTION  3
+
+// Specular white
+MATERIAL 5
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 1 1
+REFL        0
 REFR        0
-REFRIOR     0
+REFRIOR     1.25
 EMITTANCE   0
+ABSORPTION  4
 
 // Camera
 CAMERA
 RES         800 800
 FOVY        45
-ITERATIONS  5000
+ITERATIONS  2
 DEPTH       8
-FILE        cornell
+FILE        naa
 EYE         0.0 5 10.5
 LOOKAT      0 5 0
 UP          0 1 0
-
+LENSRADIUS  0.5
+FOCALDIST   15
 
 // Ceiling light
 OBJECT 0
-cube
+sphere
 material 0
 TRANS       0 10 0
 ROTAT       0 0 0
-SCALE       3 .3 3
+SCALE       3 1 3
 
 // Floor
 OBJECT 1
@@ -112,6 +125,22 @@ SCALE       .01 10 10
 OBJECT 6
 sphere
 material 4
-TRANS       -1 4 -1
-ROTAT       0 0 0
+TRANS       -1 5 -1
+ROTAT       0 45 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 7
+sphere
+material 4
+TRANS       3 3 3
+ROTAT       0 45 0
 SCALE       3 3 3
+
+// Sphere
+OBJECT 8
+cube
+material 5
+TRANS       -2 2 3
+ROTAT       0 45 0
+SCALE       2 2 2
diff --git a/src/interactions.h b/src/interactions.h
index 5ce3628..b06ba30 100644
--- a/src/interactions.h
+++ b/src/interactions.h
@@ -1,6 +1,11 @@
 #pragma once
 
 #include "intersections.h"
+#include <math.h>
+
+#define COSINESAMPLE 1
+#define MIS 0
+#define FRESNEL 1
 
 // CHECKITOUT
 /**
@@ -9,36 +14,248 @@
  */
 __host__ __device__
 glm::vec3 calculateRandomDirectionInHemisphere(
-        glm::vec3 normal, thrust::default_random_engine &rng) {
-    thrust::uniform_real_distribution<float> u01(0, 1);
-
-    float up = sqrt(u01(rng)); // cos(theta)
-    float over = sqrt(1 - up * up); // sin(theta)
-    float around = u01(rng) * TWO_PI;
-
-    // Find a direction that is not the normal based off of whether or not the
-    // normal's components are all equal to sqrt(1/3) or whether or not at
-    // least one component is less than sqrt(1/3). Learned this trick from
-    // Peter Kutz.
-
-    glm::vec3 directionNotNormal;
-    if (abs(normal.x) < SQRT_OF_ONE_THIRD) {
-        directionNotNormal = glm::vec3(1, 0, 0);
-    } else if (abs(normal.y) < SQRT_OF_ONE_THIRD) {
-        directionNotNormal = glm::vec3(0, 1, 0);
-    } else {
-        directionNotNormal = glm::vec3(0, 0, 1);
-    }
-
-    // Use not-normal direction to generate two perpendicular directions
-    glm::vec3 perpendicularDirection1 =
-        glm::normalize(glm::cross(normal, directionNotNormal));
-    glm::vec3 perpendicularDirection2 =
-        glm::normalize(glm::cross(normal, perpendicularDirection1));
-
-    return up * normal
-        + cos(around) * over * perpendicularDirection1
-        + sin(around) * over * perpendicularDirection2;
+glm::vec3 normal, thrust::default_random_engine &rng) {
+	thrust::uniform_real_distribution<float> u01(0, 1);
+
+	float up = sqrt(u01(rng)); // cos(theta)
+	float over = sqrt(1 - up * up); // sin(theta)
+	float around = u01(rng) * TWO_PI;
+
+	// Find a direction that is not the normal based off of whether or not the
+	// normal's components are all equal to sqrt(1/3) or whether or not at
+	// least one component is less than sqrt(1/3). Learned this trick from
+	// Peter Kutz.
+
+	glm::vec3 directionNotNormal;
+	if (abs(normal.x) < SQRT_OF_ONE_THIRD) {
+		directionNotNormal = glm::vec3(1, 0, 0);
+	}
+	else if (abs(normal.y) < SQRT_OF_ONE_THIRD) {
+		directionNotNormal = glm::vec3(0, 1, 0);
+	}
+	else {
+		directionNotNormal = glm::vec3(0, 0, 1);
+	}
+
+	// Use not-normal direction to generate two perpendicular directions
+	glm::vec3 perpendicularDirection1 =
+		glm::normalize(glm::cross(normal, directionNotNormal));
+	glm::vec3 perpendicularDirection2 =
+		glm::normalize(glm::cross(normal, perpendicularDirection1));
+
+	return up * normal
+		+ cos(around) * over * perpendicularDirection1
+		+ sin(around) * over * perpendicularDirection2;
+}
+
+__host__ __device__
+void concentricSampleDisc(float &u, float v){
+	float r, theta;
+
+	float sx = 2 * u - 1;
+	float sy = 2 * v - 1;
+
+	if (sx == 0.0 && sy == 0.0)
+	{
+		return;
+	}
+	if (sx >= -sy)
+	{
+		if (sx > sy)
+		{
+			// Handle first region of disk
+			r = sx;
+			if (sy > 0.0) theta = sy / r;
+			else          theta = 8.0f + sy / r;
+		}
+		else
+		{
+			// Handle second region of disk
+			r = sy;
+			theta = 2.0f - sx / r;
+		}
+	}
+	else
+	{
+		if (sx <= sy)
+		{
+			// Handle third region of disk
+			r = -sx;
+			theta = 4.0f - sy / r;
+		}
+		else
+		{
+			// Handle fourth region of disk
+			r = -sy;
+			theta = 6.0f + sx / r;
+		}
+	}
+	theta *= PI / 4.f;
+	u = r * glm::cos(theta);
+	v = r * glm::sin(theta);
+}
+
+__host__ __device__
+glm::vec3 cosineSemiHemisphere(glm::vec3 normal, thrust::default_random_engine &rng) {
+	thrust::uniform_real_distribution<float> u01(0, 1);
+	float u = u01(rng);
+	float v = u01(rng);
+
+	concentricSampleDisc(u, v);
+	float z = glm::sqrt(glm::max(0.f, 1.f - u*u - v*v));
+
+	// Find a direction that is not the normal based off of whether or not the
+	// normal's components are all equal to sqrt(1/3) or whether or not at
+	// least one component is less than sqrt(1/3). Learned this trick from
+	// Peter Kutz.
+
+	glm::vec3 directionNotNormal;
+	if (abs(normal.x) < SQRT_OF_ONE_THIRD) {
+		directionNotNormal = glm::vec3(1, 0, 0);
+	}
+	else if (abs(normal.y) < SQRT_OF_ONE_THIRD) {
+		directionNotNormal = glm::vec3(0, 1, 0);
+	}
+	else {
+		directionNotNormal = glm::vec3(0, 0, 1);
+	}
+
+	// Use not-normal direction to generate two perpendicular directions
+	glm::vec3 perpendicularDirection1 =
+		glm::normalize(glm::cross(normal, directionNotNormal));
+	glm::vec3 perpendicularDirection2 =
+		glm::normalize(glm::cross(normal, perpendicularDirection1));
+
+	return u * perpendicularDirection1 + v * perpendicularDirection2 + z * normal;
+}
+
+__host__ __device__
+glm::vec3 lambertBxdf(glm::vec3 in, glm::vec3 out, const Material &m){
+	return m.color / PI;
+}
+
+__host__ __device__
+float lambertPDF(glm::vec3 in, glm::vec3 out, glm::vec3 normal){
+	return glm::dot(glm::normalize(normal), in) / PI;
+}
+
+__host__ __device__
+glm::vec3 perfectSpecularBxdf(glm::vec3 in, glm::vec3 normal, const Material &m){
+	return m.specular.color / glm::abs(glm::dot(in, normal));
+}
+
+__host__ __device__
+float perfectSpecularPDF(glm::vec3 in, glm::vec3 out, glm::vec3 normal){
+	return 0.f;
+}
+
+__host__ __device__
+glm::vec3 fresnelDielectricBxdf(glm::vec3 in, glm::vec3 out, glm::vec3 normal, const Material &m) {
+	float etat, etai;
+	glm::dot(in, normal) > 0.f ? etat = 1.f, etai = m.indexOfRefraction : etai = 1.f, etat = m.indexOfRefraction;
+
+	float cost = glm::abs(glm::dot(in, normal));
+	float cosi = glm::abs(glm::dot(out, normal));
+
+	float Rparl = ((etat * cosi) - (etai * cost)) /
+		((etat * cosi) + (etai * cost));
+	float Rperp = ((etai * cosi) - (etat * cost)) /
+		((etai * cosi) + (etat * cost));
+
+	float fresnel = (Rparl*Rparl + Rperp*Rperp) / 2.f;
+	return (etat * etat) / (etai * etai) * (1 - fresnel) * m.color / glm::abs(glm::dot(in, normal));
+}
+
+__host__ __device__
+glm::vec3 fresnelConductorBxdf(glm::vec3 in, glm::vec3 normal, const Material &m){
+	float cosi = glm::abs(glm::dot(in, normal));
+	float tmp = (m.indexOfRefraction * m.indexOfRefraction + m.absorption * m.absorption) * cosi * cosi;
+	float Rparl2 = (tmp - (2.f * m.indexOfRefraction * cosi) + 1) / (tmp + (2.f * m.indexOfRefraction * cosi) + 1);
+
+	tmp = m.indexOfRefraction * m.indexOfRefraction + m.absorption * m.absorption;
+	float Rperp2 = (tmp - (2.f * m.indexOfRefraction * cosi) + cosi*cosi) / (tmp + (2.f * m.indexOfRefraction * cosi) + cosi*cosi);
+	return (Rparl2 + Rperp2) / 2.f * m.color / glm::abs(glm::dot(in, normal));
+}
+
+__host__ __device__
+void sampleBxdf(
+glm::vec3 &bxdf
+, float &pdf
+, glm::vec3 in
+, glm::vec3 out
+, glm::vec3 normal
+, const Material &m
+, thrust::default_random_engine &rng
+, int bxdfContrib) {
+	thrust::uniform_real_distribution<float> u01(0, 1);
+	float matType = m.hasReflective + m.hasRefractive;
+	if (matType < 0.01f) matType = 1.f;
+
+	float r = u01(rng);
+	if (r < m.hasReflective / matType) {
+#if FRESNEL
+		bxdf = bxdfContrib > 0 ? fresnelConductorBxdf(in, normal, m) : glm::vec3(0.f);
+#else
+		bxdf = bxdfContrib > 0 ? perfectSpecularBxdf(in, normal, m) : glm::vec3(0.f);
+#endif
+		pdf = bxdfContrib > 0 ? 1.f : perfectSpecularPDF(in, out, normal);
+	}
+	else if (r < (m.hasRefractive + m.hasReflective) / matType) {
+#if FRESNEL
+		bxdf = bxdfContrib > 0 ? fresnelDielectricBxdf(in, out, normal, m) : glm::vec3(0.f);
+#else
+		bxdf = bxdfContrib > 0 ? perfectSpecularBxdf(in, normal, m) : glm::vec3(0.f);
+#endif
+		pdf = bxdfContrib > 0 ? 1.f : 0.f;
+	}
+	else {
+		bxdf = lambertBxdf(in, out, m);
+		pdf = lambertPDF(in, out, normal);
+	}
+
+}
+
+__host__ __device__
+float lightPDF(glm::vec3 in, glm::vec3 normal, glm::vec3 intersect, glm::vec3 origin, float lightArea) {
+	float cosTheta = glm::dot(-in, glm::normalize(normal));
+	return glm::length(intersect - origin) / (cosTheta * lightArea);
+}
+
+__host__ __device__
+glm::vec3 lightEnergy(glm::vec3 in, glm::vec3 normal, const Material & lm) {
+	return glm::dot(-in, normal) > 0 ? lm.color * lm.emittance : glm::vec3(0.f);
+}
+
+__host__ __device__
+float powerHeuristic(float pdf1, float pdf2){
+	return (pdf1 * pdf2) / (pdf1 * pdf1 + pdf2 * pdf2);
+}
+
+__host__ __device__
+void sampleBrdf(
+PathSegment & pathSeg,
+glm::vec3 intersect,
+glm::vec3 normal,
+const Material &m,
+thrust::default_random_engine &rng
+){
+	if (m.hasReflective > 0.f) {
+		pathSeg.ray.direction = glm::reflect(pathSeg.ray.direction, normal);
+	}
+	else if (m.hasRefractive > 0.f) {
+		float IOR = pathSeg.inside ? m.indexOfRefraction : 1.f / m.indexOfRefraction;
+		pathSeg.inside = !pathSeg.inside;
+		pathSeg.ray.direction = glm::refract(pathSeg.ray.direction, normal, IOR);
+	}
+	else{
+#if COSINESAMPLE
+		pathSeg.ray.direction = glm::normalize(cosineSemiHemisphere(normal, rng));
+#else
+		pathSeg.ray.direction = glm::normalize(calculateRandomDirectionInHemisphere(normal, rng));
+#endif
+	}
+
+	pathSeg.ray.origin = intersect + (0.01f * pathSeg.ray.direction);
 }
 
 /**
@@ -47,11 +264,11 @@ glm::vec3 calculateRandomDirectionInHemisphere(
  * A perfect specular surface scatters in the reflected ray direction.
  * In order to apply multiple effects to one surface, probabilistically choose
  * between them.
- * 
+ *
  * The visual effect you want is to straight-up add the diffuse and specular
  * components. You can do this in a few ways. This logic also applies to
  * combining other types of materias (such as refractive).
- * 
+ *
  * - Always take an even (50/50) split between a each effect (a diffuse bounce
  *   and a specular bounce), but divide the resulting color of either branch
  *   by its probability (0.5), to counteract the chance (0.5) of the branch
@@ -68,12 +285,189 @@ glm::vec3 calculateRandomDirectionInHemisphere(
  */
 __host__ __device__
 void scatterRay(
-		PathSegment & pathSegment,
-        glm::vec3 intersect,
-        glm::vec3 normal,
-        const Material &m,
-        thrust::default_random_engine &rng) {
-    // TODO: implement this.
-    // A basic implementation of pure-diffuse shading will just call the
-    // calculateRandomDirectionInHemisphere defined above.
+PathSegment & pathSegment,
+glm::vec3 intersect,
+glm::vec3 normal,
+const Material &m,
+thrust::default_random_engine &rng) {
+	// TODO: implement this.
+	// A basic implementation of pure-diffuse shading will just call the
+	// calculateRandomDirectionInHemisphere defined above.
+	if (m.hasReflective > 0.f) {
+		pathSegment.ray.direction = glm::reflect(pathSegment.ray.direction, normal);
+	}
+	else if (m.hasRefractive > 0.f) {
+		float IOR = pathSegment.inside ? m.indexOfRefraction : 1.f / m.indexOfRefraction;
+		pathSegment.inside = !pathSegment.inside;
+		pathSegment.ray.direction = glm::refract(pathSegment.ray.direction, normal, IOR);
+	}
+	else{
+		pathSegment.ray.direction = glm::normalize(calculateRandomDirectionInHemisphere(normal, rng));
+	}
+	pathSegment.color *= m.color;
+	pathSegment.ray.origin = intersect + (0.01f * pathSegment.ray.direction);
+	pathSegment.remainingBounces--;
+}
+
+__host__ __device__
+void scatterWithDirectLighting(
+PathSegment & pathSegment,
+PathSegment & shadowFeeler,
+PathSegment & brdfSample,
+ShadeableIntersection & objIntersect,
+ShadeableIntersection & lightIntersect,
+ShadeableIntersection & brdfIntersect,
+const Material &m,
+const Material &lm,
+const Material &bm,
+int num_lights,
+thrust::default_random_engine &rng){
+
+	// Temp color
+	glm::vec3 col;
+
+	// Direct lighting part
+	glm::vec3 wi = glm::normalize(shadowFeeler.ray.direction);
+	glm::vec3 lightContribution;
+	glm::vec3 brdfContribution;
+	float lightPdf = -0.001f;
+	float brdfPdf = -0.001f;
+
+#if MIS
+	// We actually hit a light with our shadow feeler! Here we calculate the lighting sample.
+	if (lightIntersect.t > 0.f && lm.emittance > 0.f) {
+
+		lightContribution = lightEnergy(wi, lightIntersect.surfaceNormal, lm);
+		lightPdf = lightPDF(wi, lightIntersect.surfaceNormal, lightIntersect.intersect,
+			shadowFeeler.ray.origin, lightIntersect.surfaceArea);
+
+		sampleBxdf(brdfContribution, brdfPdf, wi, pathSegment.ray.direction
+			, objIntersect.surfaceNormal, m, rng, 0.f);
+
+		if (lightPdf > 0.f && glm::length2(lightContribution) > 0.f && brdfPdf > 0.f && glm::length2(brdfContribution) > 0.f) {
+			float dot_pdf = glm::abs(glm::dot(wi, glm::normalize(objIntersect.surfaceNormal))) / lightPdf;
+			float w = powerHeuristic(lightPdf, brdfPdf);
+			col += w * brdfContribution * lightContribution * dot_pdf * (float)num_lights;
+		}
+	}
+
+	// Reset variables
+	wi = glm::vec3(0.f);
+	brdfContribution = glm::vec3(0.f);
+	lightContribution = glm::vec3(0.f);
+	lightPdf = 0.f;
+	brdfPdf = 0.f;
+#endif
+
+	// Compute brdf contribution
+	// We need to do this to find out what the throughput should be decreased by
+	wi = glm::normalize(brdfSample.ray.direction);
+	sampleBxdf(brdfContribution, brdfPdf, wi, pathSegment.ray.direction
+		, objIntersect.surfaceNormal, m, rng, 1.f);
+
+	if (brdfIntersect.t > 0.f && bm.emittance > 0.f) {
+		// Brdf sample hit a light
+		lightContribution = lightEnergy(wi, brdfIntersect.surfaceNormal, bm);
+		lightPdf = lightPDF(wi, brdfIntersect.surfaceNormal, brdfIntersect.intersect, brdfSample.ray.origin, brdfIntersect.surfaceArea);
+
+		if (brdfPdf > 0.f && glm::length2(brdfContribution) > 0.f && lightPdf > 0.f && glm::length2(lightContribution) > 0.f) {
+			float dot_pdf = glm::max(0.f, glm::abs(glm::dot(wi, glm::normalize(objIntersect.surfaceNormal)))) / brdfPdf;
+			float w = powerHeuristic(brdfPdf, lightPdf);
+#if !MIS
+			w = 1.f;
+#endif
+			col += w * brdfContribution * lightContribution * dot_pdf;
+		}
+	}
+
+	// Add new colors
+	pathSegment.color += pathSegment.throughput * col;
+
+	// Early exit
+	if (glm::length2(brdfContribution) <= 0.f || brdfPdf <= 0.f) {
+		pathSegment.remainingBounces = 0;
+	}
+
+	// Russian Roulette!
+	if (pathSegment.remainingBounces == 0) {
+		thrust::uniform_real_distribution<float> u01(0, 1);
+		float maxTP = glm::max(pathSegment.throughput[0], glm::max(pathSegment.throughput[1], pathSegment.throughput[2]));
+		if (u01(rng) > glm::min(0.5f, maxTP))
+			pathSegment.remainingBounces = 0;
+	}
+
+	// Update throughput
+	brdfContribution *= glm::abs(glm::dot(wi, objIntersect.surfaceNormal)) / (brdfPdf * 2.f);
+	pathSegment.throughput *= brdfContribution;
+
+	// Set up new path segments
+	sampleBrdf(pathSegment,
+		objIntersect.intersect,
+		objIntersect.surfaceNormal,
+		m,
+		rng);
+	pathSegment.remainingBounces--;
 }
+
+__host__ __device__
+glm::vec3 sampleCube(thrust::default_random_engine &rng) {
+	thrust::uniform_real_distribution<float> u01(0, 1);
+	int face = (int)(5.95f * u01(rng));
+
+	float p1 = u01(rng) - 0.5f;
+	float p2 = u01(rng) - 0.5f;
+	glm::vec3 P(0.f);
+
+	// 0->1 is + and 1->2 is - on f1, 0->1 is x, 1->2 is y, 2->3 is z on f2
+	if (face == 0){
+		// +X
+		P[0] = 0.5f;
+		P[1] = p1;
+		P[2] = p2;
+	}
+	else if (face == 1){
+		// -X
+		P[0] = -0.5f;
+		P[1] = p1;
+		P[2] = p2;
+	}
+	else if (face == 2){
+		// +Y
+		P[0] = p1;
+		P[1] = 0.5f;
+		P[2] = p2;
+	}
+	else if (face == 3){
+		// -Y
+		P[0] = p1;
+		P[1] = -0.5f;
+		P[2] = p2;
+	}
+	else if (face == 4){
+		// +Z
+		P[0] = p1;
+		P[1] = p2;
+		P[2] = 0.5f;
+	}
+	else {
+		// -Z
+		P[0] = p1;
+		P[1] = p2;
+		P[2] = -0.5f;
+	}
+
+	return P;
+}
+
+__host__ __device__
+glm::vec3 sampleSphere(thrust::default_random_engine &rng) {
+	thrust::uniform_real_distribution<float> u01(0, 1);
+
+	float z = 1.f - 2.f * u01(rng);
+	float r = glm::sqrt(glm::max(0.f, 1.f - z*z));
+	float phi = 2.f * PI * u01(rng);
+	float x = r * glm::cos(phi);
+	float y = r * glm::sin(phi);
+
+	return glm::vec3(x / 2, y / 2, z / 2);
+}
\ No newline at end of file
diff --git a/src/pathtrace.cu b/src/pathtrace.cu
index c1ec122..fb1f68f 100644
--- a/src/pathtrace.cu
+++ b/src/pathtrace.cu
@@ -4,6 +4,8 @@
 #include <thrust/execution_policy.h>
 #include <thrust/random.h>
 #include <thrust/remove.h>
+#include <thrust/device_vector.h>
+#include <thrust/sort.h>
 
 #include "sceneStructs.h"
 #include "scene.h"
@@ -15,56 +17,64 @@
 #include "interactions.h"
 
 #define ERRORCHECK 1
+#define DIRECTLIGHTING 1
+#define CACHEFIRSTRAY 1
+#define ANTIALIAS 1
+#define STREAMCOMPACT 1
+#define MATERIALSORT 0 // Doesn't work for some reason, breaks on sort_by_key
+#define TIME 0
+#define LENSJITTER 0
+#define BLOCKSIZE 128
 
 #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
 #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
 void checkCUDAErrorFn(const char *msg, const char *file, int line) {
 #if ERRORCHECK
-    cudaDeviceSynchronize();
-    cudaError_t err = cudaGetLastError();
-    if (cudaSuccess == err) {
-        return;
-    }
-
-    fprintf(stderr, "CUDA error");
-    if (file) {
-        fprintf(stderr, " (%s:%d)", file, line);
-    }
-    fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err));
+	cudaDeviceSynchronize();
+	cudaError_t err = cudaGetLastError();
+	if (cudaSuccess == err) {
+		return;
+	}
+
+	fprintf(stderr, "CUDA error");
+	if (file) {
+		fprintf(stderr, " (%s:%d)", file, line);
+	}
+	fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err));
 #  ifdef _WIN32
-    getchar();
+	getchar();
 #  endif
-    exit(EXIT_FAILURE);
+	exit(EXIT_FAILURE);
 #endif
 }
 
 __host__ __device__
 thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int depth) {
-    int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index);
-    return thrust::default_random_engine(h);
+	int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index);
+	return thrust::default_random_engine(h);
 }
 
 //Kernel that writes the image to the OpenGL PBO directly.
 __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution,
-        int iter, glm::vec3* image) {
-    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
-    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
-
-    if (x < resolution.x && y < resolution.y) {
-        int index = x + (y * resolution.x);
-        glm::vec3 pix = image[index];
-
-        glm::ivec3 color;
-        color.x = glm::clamp((int) (pix.x / iter * 255.0), 0, 255);
-        color.y = glm::clamp((int) (pix.y / iter * 255.0), 0, 255);
-        color.z = glm::clamp((int) (pix.z / iter * 255.0), 0, 255);
-
-        // Each thread writes one pixel location in the texture (textel)
-        pbo[index].w = 0;
-        pbo[index].x = color.x;
-        pbo[index].y = color.y;
-        pbo[index].z = color.z;
-    }
+	int iter, glm::vec3* image) {
+	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+	int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+
+	if (x < resolution.x && y < resolution.y) {
+		int index = x + (y * resolution.x);
+		glm::vec3 pix = image[index];
+
+		glm::ivec3 color;
+		color.x = glm::clamp((int)(pix.x / iter * 255.0), 0, 255);
+		color.y = glm::clamp((int)(pix.y / iter * 255.0), 0, 255);
+		color.z = glm::clamp((int)(pix.z / iter * 255.0), 0, 255);
+
+		// Each thread writes one pixel location in the texture (textel)
+		pbo[index].w = 0;
+		pbo[index].x = color.x;
+		pbo[index].y = color.y;
+		pbo[index].z = color.z;
+	}
 }
 
 static Scene * hst_scene = NULL;
@@ -73,42 +83,68 @@ static Geom * dev_geoms = NULL;
 static Material * dev_materials = NULL;
 static PathSegment * dev_paths = NULL;
 static ShadeableIntersection * dev_intersections = NULL;
-// TODO: static variables for device memory, any extra info you need, etc
-// ...
+
+// For caching first bounce
+static PathSegment * dev_first_cache = NULL;
+static bool firstTime = true;
+
+// For material sorting
+static int * dev_materialIds = NULL;
+
+// Direct lighting variable
+static int numLights = 0;
 
 void pathtraceInit(Scene *scene) {
-    hst_scene = scene;
-    const Camera &cam = hst_scene->state.camera;
-    const int pixelcount = cam.resolution.x * cam.resolution.y;
+	hst_scene = scene;
+	const Camera &cam = hst_scene->state.camera;
+	const int pixelcount = cam.resolution.x * cam.resolution.y;
 
-    cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3));
-    cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3));
+	cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3));
+	cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3));
 
-  	cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment));
+	cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment));
+	cudaMalloc(&dev_first_cache, pixelcount * sizeof(PathSegment));
+	//cudaMalloc(&dev_shadowFeelers, pixelcount * sizeof(PathSegment));
 
-  	cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom));
-  	cudaMemcpy(dev_geoms, scene->geoms.data(), scene->geoms.size() * sizeof(Geom), cudaMemcpyHostToDevice);
+	cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom));
+	cudaMemcpy(dev_geoms, scene->geoms.data(), scene->geoms.size() * sizeof(Geom), cudaMemcpyHostToDevice);
 
-  	cudaMalloc(&dev_materials, scene->materials.size() * sizeof(Material));
-  	cudaMemcpy(dev_materials, scene->materials.data(), scene->materials.size() * sizeof(Material), cudaMemcpyHostToDevice);
+	cudaMalloc(&dev_materials, scene->materials.size() * sizeof(Material));
+	cudaMemcpy(dev_materials, scene->materials.data(), scene->materials.size() * sizeof(Material), cudaMemcpyHostToDevice);
 
-  	cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection));
-  	cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+	cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection));
+	cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
 
-    // TODO: initialize any extra device memeory you need
+	// TODO: initialize any extra device memeory you need
+	cudaMalloc(&dev_materialIds, pixelcount * sizeof(int));
+	cudaMemset(dev_materialIds, 0, pixelcount * sizeof(int));
 
-    checkCUDAError("pathtraceInit");
+	checkCUDAError("pathtraceInit");
 }
 
 void pathtraceFree() {
-    cudaFree(dev_image);  // no-op if dev_image is null
-  	cudaFree(dev_paths);
-  	cudaFree(dev_geoms);
-  	cudaFree(dev_materials);
-  	cudaFree(dev_intersections);
-    // TODO: clean up any extra device memory you created
-
-    checkCUDAError("pathtraceFree");
+	cudaFree(dev_image);  // no-op if dev_image is null
+	cudaFree(dev_paths);
+	cudaFree(dev_geoms);
+	cudaFree(dev_materials);
+	cudaFree(dev_intersections);
+	// TODO: clean up any extra device memory you created
+	cudaFree(dev_first_cache);
+	cudaFree(dev_materialIds);
+	checkCUDAError("pathtraceFree");
+}
+
+__device__ __host__
+void lensJitter(PathSegment & path, Camera & cam, float u, float v){
+	concentricSampleDisc(u, v);
+
+	u *= cam.lensRadius;
+	v *= cam.lensRadius;
+
+	float ft = cam.focalDistance / glm::abs(path.ray.direction[2]);
+	glm::vec3 pFocus = path.ray.origin + ft * path.ray.direction;
+	path.ray.origin += glm::vec3(u, v, 0.f);
+	path.ray.direction = glm::normalize(pFocus - path.ray.origin);
 }
 
 /**
@@ -129,19 +165,96 @@ __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, Path
 		PathSegment & segment = pathSegments[index];
 
 		segment.ray.origin = cam.position;
-    segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
+#if DIRECTLIGHTING
+		segment.color = glm::vec3(0.0f, 0.0f, 0.0f);
+#else
+		segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
+#endif
+		segment.throughput = glm::vec3(1.f);
+		segment.inside = false;
+	
+		thrust::default_random_engine rng = makeSeededRandomEngine(iter, index, 0);
+		thrust::uniform_real_distribution<float> u01(0, 1);
 
-		// TODO: implement antialiasing by jittering the ray
+#if ANTIALIAS
+		segment.ray.direction = glm::normalize(cam.view
+			- cam.right * cam.pixelLength.x * ((float)x +  u01(rng) - (float)cam.resolution.x * 0.5f)
+			- cam.up * cam.pixelLength.y * ((float)y + u01(rng) - (float)cam.resolution.y * 0.5f)
+			);
+#else
 		segment.ray.direction = glm::normalize(cam.view
 			- cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f)
 			- cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f)
 			);
+#endif
+
+#if LENSJITTER
+		// Apply lens jitter for depth of field
+		lensJitter(segment, cam, u01(rng), u01(rng));
+#endif
 
 		segment.pixelIndex = index;
 		segment.remainingBounces = traceDepth;
 	}
 }
 
+// Copied from body of computeIntersections to make it a bit more flexible
+__device__ __host__ void computeSingleIntersection(
+	PathSegment & pathSegment
+	, Geom * geoms
+	, int geoms_size
+	, ShadeableIntersection & intersection) {
+
+	float t;
+	glm::vec3 intersect_point;
+	glm::vec3 normal;
+	float t_min = FLT_MAX;
+	int hit_geom_index = -1;
+	bool outside = true;
+
+	glm::vec3 tmp_intersect;
+	glm::vec3 tmp_normal;
+
+	for (int i = 0; i < geoms_size; i++)
+	{
+		Geom & geom = geoms[i];
+
+		if (geom.type == CUBE)
+		{
+			t = boxIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside);
+		}
+		else if (geom.type == SPHERE)
+		{
+			t = sphereIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside);
+		}
+		// TODO: add more intersection tests here... triangle? metaball? CSG?
+
+		// Compute the minimum t from the intersection tests to determine what
+		// scene geometry object was hit first.
+		if (t > 0.0f && t_min > t)
+		{
+			t_min = t;
+			hit_geom_index = i;
+			intersect_point = tmp_intersect;
+			normal = tmp_normal;
+		}
+	}
+
+	if (hit_geom_index == -1)
+	{
+		intersection.t = -1.0f;
+	}
+	else
+	{
+		//The ray hits something
+		intersection.t = t_min;
+		intersection.materialId = geoms[hit_geom_index].materialid;
+		intersection.surfaceNormal = normal;
+		intersection.intersect = intersect_point;
+		intersection.surfaceArea = geoms[hit_geom_index].surfaceArea;
+	}
+}
+
 // TODO:
 // computeIntersections handles generating ray intersections ONLY.
 // Generating new rays is handled in your shader(s).
@@ -161,108 +274,140 @@ __global__ void computeIntersections(
 	{
 		PathSegment pathSegment = pathSegments[path_index];
 
-		float t;
-		glm::vec3 intersect_point;
-		glm::vec3 normal;
-		float t_min = FLT_MAX;
-		int hit_geom_index = -1;
-		bool outside = true;
-
-		glm::vec3 tmp_intersect;
-		glm::vec3 tmp_normal;
-
-		// naive parse through global geoms
+		computeSingleIntersection(pathSegment, geoms, geoms_size, intersections[path_index]);
+	}
+}
 
-		for (int i = 0; i < geoms_size; i++)
-		{
-			Geom & geom = geoms[i];
+// Use actual shader
+__global__ void shadeMaterial(
+	int iter
+	, int depth
+	, int num_paths
+	, int num_lights
+	, int num_geoms
+	, Geom * geoms
+	, ShadeableIntersection * shadeableIntersections
+	, PathSegment * pathSegments
+	, Material * materials
+	)
+{
+	int idx = blockIdx.x * blockDim.x + threadIdx.x;
+	if (idx < num_paths)
+	{
 
-			if (geom.type == CUBE)
-			{
-				t = boxIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside);
-			}
-			else if (geom.type == SPHERE)
-			{
-				t = sphereIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside);
-			}
-			// TODO: add more intersection tests here... triangle? metaball? CSG?
-
-			// Compute the minimum t from the intersection tests to determine what
-			// scene geometry object was hit first.
-			if (t > 0.0f && t_min > t)
-			{
-				t_min = t;
-				hit_geom_index = i;
-				intersect_point = tmp_intersect;
-				normal = tmp_normal;
+		ShadeableIntersection intersection = shadeableIntersections[idx];
+		if (pathSegments[idx].remainingBounces > 0 && intersection.t > 0.0f) { // if the intersection exists...
+			// Set up the RNG
+			// LOOK: this is how you use thrust's RNG! Please look at
+			// makeSeededRandomEngine as well.
+			thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, 0);
+			thrust::uniform_real_distribution<float> u01(0, 1);
+
+			Material material = materials[intersection.materialId];
+
+			// If the material indicates that the object was a light, "light" the ray
+			if (material.emittance > 0.f) {
+#if DIRECTLIGHTING
+				if (depth == 0) {
+					pathSegments[idx].color = pathSegments[idx].throughput * (material.color * material.emittance);
+				}
+#else
+				pathSegments[idx].color *= (material.color * material.emittance);
+#endif
+				pathSegments[idx].remainingBounces = 0;
 			}
-		}
+			// Otherwise, do some pseudo-lighting computation. This is actually more
+			// like what you would expect from shading in a rasterizer like OpenGL.
+			else {
+
+#if DIRECTLIGHTING	
+				thrust::uniform_real_distribution<float> u0L(0, num_lights);
+
+				// Pick a random light
+				int chosenLight = (int)u0L(rng);
+				Geom theLight = geoms[chosenLight];
+
+				// Do some independent operations. I think almost everything
+				// depends on this light, so just allocate some variables.
+				glm::vec4 lp = glm::vec4(0.f, 0.f, 0.f, 1.f);
+				PathSegment shadowFeeler;
+				ShadeableIntersection shadowIntersect;
+				ShadeableIntersection brdfIntersect;
+
+				// Load pathsegment here so we save some time
+				PathSegment brdfSample = pathSegments[idx];
+
+				// Generate a ray pointing towards that light
+				if (theLight.type == CUBE) {
+					lp = glm::vec4(sampleCube(rng), 1.f);
+				}
+				else if (theLight.type == SPHERE) {
+					lp = glm::vec4(sampleSphere(rng), 1.f);
+				}
+
+				glm::vec3 lightPos = glm::vec3(theLight.transform * lp);
+				shadowFeeler.ray.direction = glm::normalize(lightPos - intersection.intersect);
+				shadowFeeler.ray.origin = intersection.intersect + 0.01f * shadowFeeler.ray.direction;
+
+				// This is our light intersection
+				computeSingleIntersection(shadowFeeler, geoms, num_geoms, shadowIntersect);
+			
+				// Now we need to sample the brdf once and see if that hits the light
+				sampleBrdf(brdfSample,
+					intersection.intersect,
+					intersection.surfaceNormal,
+					material,
+					rng);
+				computeSingleIntersection(brdfSample, geoms, num_geoms, brdfIntersect);
+
+				// Do direct lighting
+				scatterWithDirectLighting(
+					pathSegments[idx],
+					shadowFeeler,
+					brdfSample,
+					intersection,
+					shadowIntersect,
+					brdfIntersect,
+					material,
+					materials[shadowIntersect.materialId],
+					materials[brdfIntersect.materialId],
+					num_lights,
+					rng);
+
+				//pathSegments[idx].color = glm::clamp(pathSegments[idx].color, 0.f, 1.f);
+				//pathSegments[idx].color = glm::vec3(shadowIntersect.t, 0.f, 0.f)/10.f;
+				//pathSegments[idx].remainingBounces = 0;
+
+#else
+				scatterRay(pathSegments[idx], 
+					intersection.intersect, 
+					intersection.surfaceNormal, 
+					material, 
+					rng);
+#endif
 
-		if (hit_geom_index == -1)
-		{
-			intersections[path_index].t = -1.0f;
+			}
+			// If there was no intersection, color the ray black.
+			// Lots of renderers use 4 channel color, RGBA, where A = alpha, often
+			// used for opacity, in which case they can indicate "no opacity".
+			// This can be useful for post-processing and image compositing.
 		}
-		else
-		{
-			//The ray hits something
-			intersections[path_index].t = t_min;
-			intersections[path_index].materialId = geoms[hit_geom_index].materialid;
-			intersections[path_index].surfaceNormal = normal;
+		else {
+			pathSegments[idx].color = glm::vec3(0.0f);
+			pathSegments[idx].remainingBounces = 0;
 		}
 	}
 }
 
-// LOOK: "fake" shader demonstrating what you might do with the info in
-// a ShadeableIntersection, as well as how to use thrust's random number
-// generator. Observe that since the thrust random number generator basically
-// adds "noise" to the iteration, the image should start off noisy and get
-// cleaner as more iterations are computed.
-//
-// Note that this shader does NOT do a BSDF evaluation!
-// Your shaders should handle that - this can allow techniques such as
-// bump mapping.
-__global__ void shadeFakeMaterial (
-  int iter
-  , int num_paths
-	, ShadeableIntersection * shadeableIntersections
-	, PathSegment * pathSegments
-	, Material * materials
-	)
-{
-  int idx = blockIdx.x * blockDim.x + threadIdx.x;
-  if (idx < num_paths)
-  {
-    ShadeableIntersection intersection = shadeableIntersections[idx];
-    if (intersection.t > 0.0f) { // if the intersection exists...
-      // Set up the RNG
-      // LOOK: this is how you use thrust's RNG! Please look at
-      // makeSeededRandomEngine as well.
-      thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, 0);
-      thrust::uniform_real_distribution<float> u01(0, 1);
-
-      Material material = materials[intersection.materialId];
-      glm::vec3 materialColor = material.color;
-
-      // If the material indicates that the object was a light, "light" the ray
-      if (material.emittance > 0.0f) {
-        pathSegments[idx].color *= (materialColor * material.emittance);
-      }
-      // Otherwise, do some pseudo-lighting computation. This is actually more
-      // like what you would expect from shading in a rasterizer like OpenGL.
-      // TODO: replace this! you should be able to start with basically a one-liner
-      else {
-        float lightTerm = glm::dot(intersection.surfaceNormal, glm::vec3(0.0f, 1.0f, 0.0f));
-        pathSegments[idx].color *= (materialColor * lightTerm) * 0.3f + ((1.0f - intersection.t * 0.02f) * materialColor) * 0.7f;
-        pathSegments[idx].color *= u01(rng); // apply some noise because why not
-      }
-    // If there was no intersection, color the ray black.
-    // Lots of renderers use 4 channel color, RGBA, where A = alpha, often
-    // used for opacity, in which case they can indicate "no opacity".
-    // This can be useful for post-processing and image compositing.
-    } else {
-      pathSegments[idx].color = glm::vec3(0.0f);
-    }
-  }
+// We go through the geoms and label the ones that are lights. This allows us to stream compact
+// to find the light sources when we want to do direct lighting.
+__global__ void findLights(int nGeom, Geom * geoms, Material * materials) {
+	int idx = blockIdx.x * blockDim.x + threadIdx.x;
+	if (idx < nGeom) {
+		if (materials[geoms[idx].materialid].emittance > 0.0001f) {
+			geoms[idx].isLight = 1;
+		}
+	}
 }
 
 // Add the current iteration's output to the overall image
@@ -277,117 +422,333 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati
 	}
 }
 
+__global__ void findMaterialIds(int nPaths, int * ids, ShadeableIntersection * isx) {
+	int index = (blockIdx.x * blockDim.x) + threadIdx.x;
+
+	if (index < nPaths)
+	{
+		ids[index] = isx[index].materialId;
+	}
+}
+
+// Used for stream compaction in thrust::partition
+struct deadPaths
+{
+	__host__ __device__
+		bool operator()(const PathSegment& pathSeg) {
+		return pathSeg.remainingBounces > 0;
+	}
+};
+
+// Use this to stream compact the geoms to find the lights. This allows us to do direct lighting.
+struct lights
+{
+	__host__ __device__
+		bool operator()(const Geom& geom) {
+		return geom.isLight > 0;
+	}
+};
+
 /**
  * Wrapper for the __global__ call that sets up the kernel calls and does a ton
  * of memory management
  */
 void pathtrace(uchar4 *pbo, int frame, int iter) {
-    const int traceDepth = hst_scene->state.traceDepth;
-    const Camera &cam = hst_scene->state.camera;
-    const int pixelcount = cam.resolution.x * cam.resolution.y;
+	const int traceDepth = hst_scene->state.traceDepth;
+	const Camera &cam = hst_scene->state.camera;
+	const int pixelcount = cam.resolution.x * cam.resolution.y;
 
 	// 2D block for generating ray from camera
-    const dim3 blockSize2d(8, 8);
-    const dim3 blocksPerGrid2d(
-            (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
-            (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+	const dim3 blockSize2d(8, 8);
+	const dim3 blocksPerGrid2d(
+		(cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+		(cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
 
 	// 1D block for path tracing
-	const int blockSize1d = 128;
-
-    ///////////////////////////////////////////////////////////////////////////
-
-    // Recap:
-    // * Initialize array of path rays (using rays that come out of the camera)
-    //   * You can pass the Camera object to that kernel.
-    //   * Each path ray must carry at minimum a (ray, color) pair,
-    //   * where color starts as the multiplicative identity, white = (1, 1, 1).
-    //   * This has already been done for you.
-    // * For each depth:
-    //   * Compute an intersection in the scene for each path ray.
-    //     A very naive version of this has been implemented for you, but feel
-    //     free to add more primitives and/or a better algorithm.
-    //     Currently, intersection distance is recorded as a parametric distance,
-    //     t, or a "distance along the ray." t = -1.0 indicates no intersection.
-    //     * Color is attenuated (multiplied) by reflections off of any object
-    //   * TODO: Stream compact away all of the terminated paths.
-    //     You may use either your implementation or `thrust::remove_if` or its
-    //     cousins.
-    //     * Note that you can't really use a 2D kernel launch any more - switch
-    //       to 1D.
-    //   * TODO: Shade the rays that intersected something or didn't bottom out.
-    //     That is, color the ray by performing a color computation according
-    //     to the shader, then generate a new ray to continue the ray path.
-    //     We recommend just updating the ray's PathSegment in place.
-    //     Note that this step may come before or after stream compaction,
-    //     since some shaders you write may also cause a path to terminate.
-    // * Finally, add this iteration's results to the image. This has been done
-    //   for you.
-
-    // TODO: perform one iteration of path tracing
-
-	generateRayFromCamera <<<blocksPerGrid2d, blockSize2d >>>(cam, iter, traceDepth, dev_paths);
+	const int blockSize1d = BLOCKSIZE;
+
+	///////////////////////////////////////////////////////////////////////////
+
+	// Recap:
+	// * Initialize array of path rays (using rays that come out of the camera)
+	//   * You can pass the Camera object to that kernel.
+	//   * Each path ray must carry at minimum a (ray, color) pair,
+	//   * where color starts as the multiplicative identity, white = (1, 1, 1).
+	//   * This has already been done for you.
+	// * For each depth:
+	//   * Compute an intersection in the scene for each path ray.
+	//     A very naive version of this has been implemented for you, but feel
+	//     free to add more primitives and/or a better algorithm.
+	//     Currently, intersection distance is recorded as a parametric distance,
+	//     t, or a "distance along the ray." t = -1.0 indicates no intersection.
+	//     * Color is attenuated (multiplied) by reflections off of any object
+	//   * TODO: Stream compact away all of the terminated paths.
+	//     You may use either your implementation or `thrust::remove_if` or its
+	//     cousins.
+	//     * Note that you can't really use a 2D kernel launch any more - switch
+	//       to 1D.
+	//   * TODO: Shade the rays that intersected something or didn't bottom out.
+	//     That is, color the ray by performing a color computation according
+	//     to the shader, then generate a new ray to continue the ray path.
+	//     We recommend just updating the ray's PathSegment in place.
+	//     Note that this step may come before or after stream compaction,
+	//     since some shaders you write may also cause a path to terminate.
+	// * Finally, add this iteration's results to the image. This has been done
+	//   for you.
+
+#if TIME
+	float total = 0.f;
+	float milliseconds = 0.f;
+	float intersect_time = 0.f;
+	float pathtract_time = 0.f; 
+	float stream_time = 0.f;
+	float material_time = 0.f;
+	cudaEvent_t start, end;
+	printf("Iteration: %d\n", iter);
+#endif
+
+#if TIME
+	cudaEventCreate(&start);
+	cudaEventCreate(&end);
+	cudaEventRecord(start);
+#endif
+	// Generate rays depending on macro settings
+#if CACHEFIRSTRAY && !ANTIALIAS
+	if (firstTime) {
+		generateRayFromCamera << <blocksPerGrid2d, blockSize2d >> >(cam, iter, traceDepth, dev_first_cache);
+		firstTime = false;
+	}
+	cudaMemcpy(dev_paths, dev_first_cache, pixelcount * sizeof(PathSegment), cudaMemcpyDeviceToDevice);
+#else
+	generateRayFromCamera << <blocksPerGrid2d, blockSize2d >> >(cam, iter, traceDepth, dev_paths);
+#endif
+	
+#if TIME
+	cudaEventRecord(end);
+	cudaEventSynchronize(end);
+	cudaEventElapsedTime(&milliseconds, start, end);
+	total += milliseconds;
+	printf("Generate rays: %4.4f \n", milliseconds);
+#endif
+
 	checkCUDAError("generate camera ray");
 
 	int depth = 0;
 	PathSegment* dev_path_end = dev_paths + pixelcount;
 	int num_paths = dev_path_end - dev_paths;
+#if TIME
+	printf("Starting number of paths: %d\n", num_paths);
+#endif
+
+	// --- Find lights for direct lighting ---
+	// 0. Realized this only needs to done once if you make numLights a global var!
+	// 1. Label the geoms using the findLights kernel.
+	// 2. Stream compact to put all the lights in front. Then we can track which ones are lights.
+#if DIRECTLIGHTING
+
+#if TIME
+	cudaEventCreate(&start);
+	cudaEventCreate(&end);
+	cudaEventRecord(start);
+#endif
+
+	if (iter == 1) {
+		dim3 numblocksFindLights = (hst_scene->geoms.size() + blockSize1d - 1) / blockSize1d;
+		findLights << <numblocksFindLights, blockSize1d >> >(hst_scene->geoms.size(), dev_geoms, dev_materials);
+		Geom* lightEnd = thrust::partition(
+			thrust::device,
+			dev_geoms,
+			dev_geoms + hst_scene->geoms.size(),
+			lights());
+		numLights = lightEnd - dev_geoms;
+	}
+	checkCUDAError("Counting number of lights for direct lighting");
+
+#if TIME
+	cudaEventRecord(end);
+	cudaEventSynchronize(end);
+	cudaEventElapsedTime(&milliseconds, start, end);
+	total += milliseconds;
+	printf("Finding Lights: %4.4f \n", milliseconds);
+#endif
+
+#endif
 
 	// --- PathSegment Tracing Stage ---
 	// Shoot ray into scene, bounce between objects, push shading chunks
-
-  bool iterationComplete = false;
+	bool iterationComplete = false;
 	while (!iterationComplete) {
+		
+		// clean shading chunks
+		cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+
+#if TIME
+		cudaEventCreate(&start);
+		cudaEventCreate(&end);
+		cudaEventRecord(start);
+#endif
 
-	// clean shading chunks
-	cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+		// tracing
+		dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
+		computeIntersections << <numblocksPathSegmentTracing, blockSize1d >> > (
+			depth
+			, num_paths
+			, dev_paths
+			, dev_geoms
+			, hst_scene->geoms.size()
+			, dev_intersections
+			);
+		cudaDeviceSynchronize();
+
+#if TIME
+		cudaEventRecord(end);
+		cudaEventSynchronize(end);
+		cudaEventElapsedTime(&milliseconds, start, end);
+		total += milliseconds;
+		intersect_time += milliseconds;
+		//printf("Round of intersections: %4.4f \n", milliseconds);
+#endif
+		// TODO:
+		// --- Shading Stage ---
+		// Shade path segments based on intersections and generate new rays by
+		// evaluating the BSDF.
+		// Start off with just a big kernel that handles all the different
+		// materials you have in the scenefile.
+		// TODO: compare between directly shading the path segments and shading
+		// path segments that have been reshuffled to be contiguous in memory.
+
+		// --- Sort by material ---
+		// 1. Get IDs from the materials
+		// 2. Sort paths accordingly with thrust
+#if MATERIALSORT
+#if TIME
+		cudaEventCreate(&start);
+		cudaEventCreate(&end);
+		cudaEventRecord(start);
+#endif
+		findMaterialIds << <numblocksPathSegmentTracing, blockSize1d >> >(
+			num_paths,
+			dev_materialIds,
+			dev_intersections);
+#if TIME
+		cudaEventRecord(end);
+		cudaEventSynchronize(end);
+		cudaEventElapsedTime(&milliseconds, start, end);
+		total += milliseconds;
+		material_time += milliseconds;
+		printf("Material Sort (gather mat ids): %4.4f \n", milliseconds);
+#endif
 
-	// tracing
-	dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
-	computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
-		depth
-		, num_paths
-		, dev_paths
-		, dev_geoms
-		, hst_scene->geoms.size()
-		, dev_intersections
-		);
-	checkCUDAError("trace one bounce");
-	cudaDeviceSynchronize();
-	depth++;
-
-
-	// TODO:
-	// --- Shading Stage ---
-	// Shade path segments based on intersections and generate new rays by
-  // evaluating the BSDF.
-  // Start off with just a big kernel that handles all the different
-  // materials you have in the scenefile.
-  // TODO: compare between directly shading the path segments and shading
-  // path segments that have been reshuffled to be contiguous in memory.
-
-  shadeFakeMaterial<<<numblocksPathSegmentTracing, blockSize1d>>> (
-    iter,
-    num_paths,
-    dev_intersections,
-    dev_paths,
-    dev_materials
-  );
-  iterationComplete = true; // TODO: should be based off stream compaction results.
+#if TIME
+		cudaEventCreate(&start);
+		cudaEventCreate(&end);
+		cudaEventRecord(start);
+#endif
+		thrust::device_ptr<int> dev_thrust_keys = thrust::device_pointer_cast(dev_materialIds);
+		thrust::device_ptr<PathSegment> dev_thrust_paths = thrust::device_pointer_cast(dev_paths);
+		thrust::device_ptr<ShadeableIntersection> dev_thrust_intersections = thrust::device_pointer_cast(dev_intersections);
+		checkCUDAError("Allocate thrust ptr");
+
+		thrust::stable_sort_by_key(dev_thrust_keys, dev_thrust_keys + num_paths, dev_thrust_paths);
+		//thrust::stable_sort_by_key(dev_thrust_keys, dev_thrust_keys + num_paths, dev_thrust_intersections);
+		//checkCUDAError("Sort materials");
+#if TIME
+		cudaEventRecord(end);
+		cudaEventSynchronize(end);
+		cudaEventElapsedTime(&milliseconds, start, end);
+		total += milliseconds;
+		material_time += milliseconds;
+		printf("Material Sort (sorting): %4.4f \n", milliseconds);
+#endif
+#endif
+
+#if TIME
+		cudaEventCreate(&start);
+		cudaEventCreate(&end);
+		cudaEventRecord(start);
+#endif
+
+		shadeMaterial<<<numblocksPathSegmentTracing, blockSize1d>>> (
+			iter,
+			depth,
+			num_paths,
+			numLights,
+			hst_scene->geoms.size(),
+			dev_geoms,
+			dev_intersections,
+			dev_paths,
+			dev_materials
+			);
+
+#if TIME
+		cudaEventRecord(end);
+		cudaEventSynchronize(end);
+		cudaEventElapsedTime(&milliseconds, start, end);
+		total += milliseconds;
+		pathtract_time += milliseconds;
+		printf("Path tracing step: %4.4f \n", milliseconds);
+#endif
+
+#if STREAMCOMPACT
+
+#if TIME
+		cudaEventCreate(&start);
+		cudaEventCreate(&end);
+		cudaEventRecord(start);
+#endif
+		// Stream compaction with partition
+		PathSegment* new_dev_path_end = thrust::partition(
+			thrust::device, 
+			dev_paths, 
+			dev_paths + num_paths, 
+			deadPaths());
+		num_paths = new_dev_path_end - dev_paths;
+
+#if TIME
+		cudaEventRecord(end);
+		cudaEventSynchronize(end);
+		cudaEventElapsedTime(&milliseconds, start, end);
+		total += milliseconds;
+		stream_time += milliseconds;
+		//printf("Stream compaction: %4.4f \n", milliseconds);
+		printf("Stream compactions (paths remaining): %d\n", num_paths);
+#endif
+
+#endif
+
+		// Wait for everything to finish
+		checkCUDAError("trace one bounce");
+		cudaDeviceSynchronize();
+		depth++;
+
+#if STREAMCOMPACT
+		iterationComplete = num_paths == 0; // TODO: should be based off stream compaction results.
+#else
+		iterationComplete = depth >= traceDepth || num_paths == 0;
+#endif
 	}
 
-  // Assemble this iteration and apply it to the image
-  dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
+#if TIME
+	printf("Total intersect: %4.4f \n", intersect_time);
+	printf("Total pathtrace: %4.4f \n", pathtract_time);
+	printf("Total streamcompact: %4.4f \n", stream_time);
+	printf("Total materialsort: %4.4f \n", material_time);
+	printf("Total: %4.4f \n", total);
+#endif
+	
+	// Assemble this iteration and apply it to the image
+	num_paths = dev_path_end - dev_paths;
+	dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
 	finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);
 
-    ///////////////////////////////////////////////////////////////////////////
+	///////////////////////////////////////////////////////////////////////////
 
-    // Send results to OpenGL buffer for rendering
-    sendImageToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, iter, dev_image);
+	// Send results to OpenGL buffer for rendering
+	sendImageToPBO << <blocksPerGrid2d, blockSize2d >> >(pbo, cam.resolution, iter, dev_image);
 
-    // Retrieve image from GPU
-    cudaMemcpy(hst_scene->state.image.data(), dev_image,
-            pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToHost);
+	// Retrieve image from GPU
+	cudaMemcpy(hst_scene->state.image.data(), dev_image,
+		pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToHost);
 
-    checkCUDAError("pathtrace");
+	checkCUDAError("pathtrace");
 }
diff --git a/src/scene.cpp b/src/scene.cpp
index cbae043..ac8a733 100644
--- a/src/scene.cpp
+++ b/src/scene.cpp
@@ -3,186 +3,238 @@
 #include <cstring>
 #include <glm/gtc/matrix_inverse.hpp>
 #include <glm/gtx/string_cast.hpp>
+#include <math.h>
 
 Scene::Scene(string filename) {
-    cout << "Reading scene from " << filename << " ..." << endl;
-    cout << " " << endl;
-    char* fname = (char*)filename.c_str();
-    fp_in.open(fname);
-    if (!fp_in.is_open()) {
-        cout << "Error reading from file - aborting!" << endl;
-        throw;
-    }
-    while (fp_in.good()) {
-        string line;
-        utilityCore::safeGetline(fp_in, line);
-        if (!line.empty()) {
-            vector<string> tokens = utilityCore::tokenizeString(line);
-            if (strcmp(tokens[0].c_str(), "MATERIAL") == 0) {
-                loadMaterial(tokens[1]);
-                cout << " " << endl;
-            } else if (strcmp(tokens[0].c_str(), "OBJECT") == 0) {
-                loadGeom(tokens[1]);
-                cout << " " << endl;
-            } else if (strcmp(tokens[0].c_str(), "CAMERA") == 0) {
-                loadCamera();
-                cout << " " << endl;
-            }
-        }
-    }
+	cout << "Reading scene from " << filename << " ..." << endl;
+	cout << " " << endl;
+	char* fname = (char*)filename.c_str();
+	fp_in.open(fname);
+	if (!fp_in.is_open()) {
+		cout << "Error reading from file - aborting!" << endl;
+		throw;
+	}
+	while (fp_in.good()) {
+		string line;
+		utilityCore::safeGetline(fp_in, line);
+		if (!line.empty()) {
+			vector<string> tokens = utilityCore::tokenizeString(line);
+			if (strcmp(tokens[0].c_str(), "MATERIAL") == 0) {
+				loadMaterial(tokens[1]);
+				cout << " " << endl;
+			}
+			else if (strcmp(tokens[0].c_str(), "OBJECT") == 0) {
+				loadGeom(tokens[1]);
+				cout << " " << endl;
+			}
+			else if (strcmp(tokens[0].c_str(), "CAMERA") == 0) {
+				loadCamera();
+				cout << " " << endl;
+			}
+		}
+	}
 }
 
 int Scene::loadGeom(string objectid) {
-    int id = atoi(objectid.c_str());
-    if (id != geoms.size()) {
-        cout << "ERROR: OBJECT ID does not match expected number of geoms" << endl;
-        return -1;
-    } else {
-        cout << "Loading Geom " << id << "..." << endl;
-        Geom newGeom;
-        string line;
-
-        //load object type
-        utilityCore::safeGetline(fp_in, line);
-        if (!line.empty() && fp_in.good()) {
-            if (strcmp(line.c_str(), "sphere") == 0) {
-                cout << "Creating new sphere..." << endl;
-                newGeom.type = SPHERE;
-            } else if (strcmp(line.c_str(), "cube") == 0) {
-                cout << "Creating new cube..." << endl;
-                newGeom.type = CUBE;
-            }
-        }
-
-        //link material
-        utilityCore::safeGetline(fp_in, line);
-        if (!line.empty() && fp_in.good()) {
-            vector<string> tokens = utilityCore::tokenizeString(line);
-            newGeom.materialid = atoi(tokens[1].c_str());
-            cout << "Connecting Geom " << objectid << " to Material " << newGeom.materialid << "..." << endl;
-        }
-
-        //load transformations
-        utilityCore::safeGetline(fp_in, line);
-        while (!line.empty() && fp_in.good()) {
-            vector<string> tokens = utilityCore::tokenizeString(line);
-
-            //load tranformations
-            if (strcmp(tokens[0].c_str(), "TRANS") == 0) {
-                newGeom.translation = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-            } else if (strcmp(tokens[0].c_str(), "ROTAT") == 0) {
-                newGeom.rotation = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-            } else if (strcmp(tokens[0].c_str(), "SCALE") == 0) {
-                newGeom.scale = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-            }
-
-            utilityCore::safeGetline(fp_in, line);
-        }
-
-        newGeom.transform = utilityCore::buildTransformationMatrix(
-                newGeom.translation, newGeom.rotation, newGeom.scale);
-        newGeom.inverseTransform = glm::inverse(newGeom.transform);
-        newGeom.invTranspose = glm::inverseTranspose(newGeom.transform);
-
-        geoms.push_back(newGeom);
-        return 1;
-    }
+	int id = atoi(objectid.c_str());
+	if (id != geoms.size()) {
+		cout << "ERROR: OBJECT ID does not match expected number of geoms" << endl;
+		return -1;
+	}
+	else {
+		cout << "Loading Geom " << id << "..." << endl;
+		Geom newGeom;
+		string line;
+
+		//load object type
+		utilityCore::safeGetline(fp_in, line);
+		if (!line.empty() && fp_in.good()) {
+			if (strcmp(line.c_str(), "sphere") == 0) {
+				cout << "Creating new sphere..." << endl;
+				newGeom.type = SPHERE;
+			}
+			else if (strcmp(line.c_str(), "cube") == 0) {
+				cout << "Creating new cube..." << endl;
+				newGeom.type = CUBE;
+			}
+		}
+
+		//link material
+		utilityCore::safeGetline(fp_in, line);
+		if (!line.empty() && fp_in.good()) {
+			vector<string> tokens = utilityCore::tokenizeString(line);
+			newGeom.materialid = atoi(tokens[1].c_str());
+			cout << "Connecting Geom " << objectid << " to Material " << newGeom.materialid << "..." << endl;
+		}
+
+		//load transformations
+		utilityCore::safeGetline(fp_in, line);
+		while (!line.empty() && fp_in.good()) {
+			vector<string> tokens = utilityCore::tokenizeString(line);
+
+			//load tranformations
+			if (strcmp(tokens[0].c_str(), "TRANS") == 0) {
+				newGeom.translation = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+			}
+			else if (strcmp(tokens[0].c_str(), "ROTAT") == 0) {
+				newGeom.rotation = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+			}
+			else if (strcmp(tokens[0].c_str(), "SCALE") == 0) {
+				newGeom.scale = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+				// While we're here, calculate the surface areas
+				if (newGeom.type == CUBE) {
+					// Just 2 * (xy + yz + zx)
+					newGeom.surfaceArea = 2.f * (newGeom.scale[0] * newGeom.scale[1] +
+						newGeom.scale[1] * newGeom.scale[2] +
+						newGeom.scale[2] * newGeom.scale[0]);
+				}
+				else if (newGeom.type == SPHERE) {
+					// Spheres are a bit more complicated because they can be ellipsoids.
+					// We follow the following formula, S = 4 * PI * ((ab^1.6 + ac^1.6 + bc^1.6)/3)^(1/1.6)
+
+					// Get the radii. Making the assumption that these are unit spheres with radius 0.5
+					float rx = newGeom.scale[0] / 2.f;
+					float ry = newGeom.scale[0] / 2.f;
+					float rz = newGeom.scale[0] / 2.f;
+
+					// Calculate the terms for the formula and then the surface area.
+					float ab = std::pow(rx * ry, 1.6f);
+					float ac = std::pow(rx * rz, 1.6f);
+					float bc = std::pow(ry * rz, 1.6f);
+
+					newGeom.surfaceArea = 4 * PI * std::pow((ab + ac + bc) / 3.f, 1.f / 1.6f);
+				}
+			}
+
+			utilityCore::safeGetline(fp_in, line);
+		}
+
+		newGeom.transform = utilityCore::buildTransformationMatrix(
+			newGeom.translation, newGeom.rotation, newGeom.scale);
+		newGeom.inverseTransform = glm::inverse(newGeom.transform);
+		newGeom.invTranspose = glm::inverseTranspose(newGeom.transform);
+
+		geoms.push_back(newGeom);
+		return 1;
+	}
 }
 
 int Scene::loadCamera() {
-    cout << "Loading Camera ..." << endl;
-    RenderState &state = this->state;
-    Camera &camera = state.camera;
-    float fovy;
-
-    //load static properties
-    for (int i = 0; i < 5; i++) {
-        string line;
-        utilityCore::safeGetline(fp_in, line);
-        vector<string> tokens = utilityCore::tokenizeString(line);
-        if (strcmp(tokens[0].c_str(), "RES") == 0) {
-            camera.resolution.x = atoi(tokens[1].c_str());
-            camera.resolution.y = atoi(tokens[2].c_str());
-        } else if (strcmp(tokens[0].c_str(), "FOVY") == 0) {
-            fovy = atof(tokens[1].c_str());
-        } else if (strcmp(tokens[0].c_str(), "ITERATIONS") == 0) {
-            state.iterations = atoi(tokens[1].c_str());
-        } else if (strcmp(tokens[0].c_str(), "DEPTH") == 0) {
-            state.traceDepth = atoi(tokens[1].c_str());
-        } else if (strcmp(tokens[0].c_str(), "FILE") == 0) {
-            state.imageName = tokens[1];
-        }
-    }
-
-    string line;
-    utilityCore::safeGetline(fp_in, line);
-    while (!line.empty() && fp_in.good()) {
-        vector<string> tokens = utilityCore::tokenizeString(line);
-        if (strcmp(tokens[0].c_str(), "EYE") == 0) {
-            camera.position = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-        } else if (strcmp(tokens[0].c_str(), "LOOKAT") == 0) {
-            camera.lookAt = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-        } else if (strcmp(tokens[0].c_str(), "UP") == 0) {
-            camera.up = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-        }
-
-        utilityCore::safeGetline(fp_in, line);
-    }
-
-    //calculate fov based on resolution
-    float yscaled = tan(fovy * (PI / 180));
-    float xscaled = (yscaled * camera.resolution.x) / camera.resolution.y;
-    float fovx = (atan(xscaled) * 180) / PI;
-    camera.fov = glm::vec2(fovx, fovy);
+	cout << "Loading Camera ..." << endl;
+	RenderState &state = this->state;
+	Camera &camera = state.camera;
+	float fovy;
+
+	//load static properties
+	for (int i = 0; i < 5; i++) {
+		string line;
+		utilityCore::safeGetline(fp_in, line);
+		vector<string> tokens = utilityCore::tokenizeString(line);
+		if (strcmp(tokens[0].c_str(), "RES") == 0) {
+			camera.resolution.x = atoi(tokens[1].c_str());
+			camera.resolution.y = atoi(tokens[2].c_str());
+		}
+		else if (strcmp(tokens[0].c_str(), "FOVY") == 0) {
+			fovy = atof(tokens[1].c_str());
+		}
+		else if (strcmp(tokens[0].c_str(), "ITERATIONS") == 0) {
+			state.iterations = atoi(tokens[1].c_str());
+		}
+		else if (strcmp(tokens[0].c_str(), "DEPTH") == 0) {
+			state.traceDepth = atoi(tokens[1].c_str());
+		}
+		else if (strcmp(tokens[0].c_str(), "FILE") == 0) {
+			state.imageName = tokens[1];
+		}
+	}
+
+	string line;
+	utilityCore::safeGetline(fp_in, line);
+	while (!line.empty() && fp_in.good()) {
+		vector<string> tokens = utilityCore::tokenizeString(line);
+		if (strcmp(tokens[0].c_str(), "EYE") == 0) {
+			camera.position = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+		}
+		else if (strcmp(tokens[0].c_str(), "LOOKAT") == 0) {
+			camera.lookAt = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+		}
+		else if (strcmp(tokens[0].c_str(), "UP") == 0) {
+			camera.up = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+		}
+		else if (strcmp(tokens[0].c_str(), "LENSRADIUS") == 0) {
+			camera.lensRadius = atof(tokens[1].c_str());
+		}
+		else if (strcmp(tokens[0].c_str(), "FOCALDIST") == 0) {
+			camera.focalDistance = atof(tokens[1].c_str());
+		}
+
+		utilityCore::safeGetline(fp_in, line);
+	}
+
+	//calculate fov based on resolution
+	float yscaled = tan(fovy * (PI / 180));
+	float xscaled = (yscaled * camera.resolution.x) / camera.resolution.y;
+	float fovx = (atan(xscaled) * 180) / PI;
+	camera.fov = glm::vec2(fovx, fovy);
 
 	camera.right = glm::normalize(glm::cross(camera.view, camera.up));
 	camera.pixelLength = glm::vec2(2 * xscaled / (float)camera.resolution.x
-							, 2 * yscaled / (float)camera.resolution.y);
+		, 2 * yscaled / (float)camera.resolution.y);
 
-    camera.view = glm::normalize(camera.lookAt - camera.position);
+	camera.view = glm::normalize(camera.lookAt - camera.position);
 
-    //set up render camera stuff
-    int arraylen = camera.resolution.x * camera.resolution.y;
-    state.image.resize(arraylen);
-    std::fill(state.image.begin(), state.image.end(), glm::vec3());
+	//set up render camera stuff
+	int arraylen = camera.resolution.x * camera.resolution.y;
+	state.image.resize(arraylen);
+	std::fill(state.image.begin(), state.image.end(), glm::vec3());
 
-    cout << "Loaded camera!" << endl;
-    return 1;
+	cout << "Loaded camera!" << endl;
+	return 1;
 }
 
 int Scene::loadMaterial(string materialid) {
-    int id = atoi(materialid.c_str());
-    if (id != materials.size()) {
-        cout << "ERROR: MATERIAL ID does not match expected number of materials" << endl;
-        return -1;
-    } else {
-        cout << "Loading Material " << id << "..." << endl;
-        Material newMaterial;
-
-        //load static properties
-        for (int i = 0; i < 7; i++) {
-            string line;
-            utilityCore::safeGetline(fp_in, line);
-            vector<string> tokens = utilityCore::tokenizeString(line);
-            if (strcmp(tokens[0].c_str(), "RGB") == 0) {
-                glm::vec3 color( atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()) );
-                newMaterial.color = color;
-            } else if (strcmp(tokens[0].c_str(), "SPECEX") == 0) {
-                newMaterial.specular.exponent = atof(tokens[1].c_str());
-            } else if (strcmp(tokens[0].c_str(), "SPECRGB") == 0) {
-                glm::vec3 specColor(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
-                newMaterial.specular.color = specColor;
-            } else if (strcmp(tokens[0].c_str(), "REFL") == 0) {
-                newMaterial.hasReflective = atof(tokens[1].c_str());
-            } else if (strcmp(tokens[0].c_str(), "REFR") == 0) {
-                newMaterial.hasRefractive = atof(tokens[1].c_str());
-            } else if (strcmp(tokens[0].c_str(), "REFRIOR") == 0) {
-                newMaterial.indexOfRefraction = atof(tokens[1].c_str());
-            } else if (strcmp(tokens[0].c_str(), "EMITTANCE") == 0) {
-                newMaterial.emittance = atof(tokens[1].c_str());
-            }
-        }
-        materials.push_back(newMaterial);
-        return 1;
-    }
+	int id = atoi(materialid.c_str());
+	if (id != materials.size()) {
+		cout << "ERROR: MATERIAL ID does not match expected number of materials" << endl;
+		return -1;
+	}
+	else {
+		cout << "Loading Material " << id << "..." << endl;
+		Material newMaterial;
+
+		//load static properties
+		for (int i = 0; i < 7; i++) {
+			string line;
+			utilityCore::safeGetline(fp_in, line);
+			vector<string> tokens = utilityCore::tokenizeString(line);
+			if (strcmp(tokens[0].c_str(), "RGB") == 0) {
+				glm::vec3 color(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+				newMaterial.color = color;
+			}
+			else if (strcmp(tokens[0].c_str(), "SPECEX") == 0) {
+				newMaterial.specular.exponent = atof(tokens[1].c_str());
+			}
+			else if (strcmp(tokens[0].c_str(), "SPECRGB") == 0) {
+				glm::vec3 specColor(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+				newMaterial.specular.color = specColor;
+			}
+			else if (strcmp(tokens[0].c_str(), "REFL") == 0) {
+				newMaterial.hasReflective = atof(tokens[1].c_str());
+			}
+			else if (strcmp(tokens[0].c_str(), "REFR") == 0) {
+				newMaterial.hasRefractive = atof(tokens[1].c_str());
+			}
+			else if (strcmp(tokens[0].c_str(), "REFRIOR") == 0) {
+				newMaterial.indexOfRefraction = atof(tokens[1].c_str());
+			}
+			else if (strcmp(tokens[0].c_str(), "EMITTANCE") == 0) {
+				newMaterial.emittance = atof(tokens[1].c_str());
+			}
+			else if (strcmp(tokens[0].c_str(), "ABSORPTION ") == 0) {
+				newMaterial.absorption = atof(tokens[1].c_str());
+			}
+		}
+		materials.push_back(newMaterial);
+		return 1;
+	}
 }
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index b38b820..ac306b3 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -26,6 +26,8 @@ struct Geom {
     glm::mat4 transform;
     glm::mat4 inverseTransform;
     glm::mat4 invTranspose;
+		float surfaceArea;
+		int isLight = 0;
 };
 
 struct Material {
@@ -38,6 +40,7 @@ struct Material {
     float hasRefractive;
     float indexOfRefraction;
     float emittance;
+		float absorption;
 };
 
 struct Camera {
@@ -49,6 +52,8 @@ struct Camera {
     glm::vec3 right;
     glm::vec2 fov;
     glm::vec2 pixelLength;
+		float lensRadius;
+		float focalDistance;
 };
 
 struct RenderState {
@@ -64,6 +69,8 @@ struct PathSegment {
 	glm::vec3 color;
 	int pixelIndex;
 	int remainingBounces;
+	glm::vec3 throughput;
+	bool inside;
 };
 
 // Use with a corresponding PathSegment to do:
@@ -73,4 +80,6 @@ struct ShadeableIntersection {
   float t;
   glm::vec3 surfaceNormal;
   int materialId;
+	glm::vec3 intersect;
+	float surfaceArea;
 };