GPU-accelerated Monte Carlo simulation of molecular communication in blood vessels. Simulates particle diffusion via Brownian motion with optional laminar drift, modeling how nanoscale messenger molecules move through the bloodstream.
Based on a master's thesis applying CUDA parallelism to molecular communication research.
src/
common/
params.h # SimParams struct — all simulation constants
cli.h # CLI parsing, usage, verbose output
main.cu # GPU entry point — dispatches to CPU/GPU simulation
main_cpu.cpp # CPU-only entry point — no CUDA dependency
simulation_cpu.cpp/.h # CPU reference: Brownian motion, hit detection
simulation_gpu.cu/.h # GPU kernels: d_simulate_isolated (long), d_update (wide), d_reflection
timing.c/.h # Wall-clock timing utility
scripts/
validate_1d_firsthit.py # Validates output against analytical solution (thesis eq 4.3)
validate_before_commit.sh # Pre-commit build + validation
colab_build_test.ipynb # Google Colab notebook for GPU testing
requirements.txt # Python deps (numpy, matplotlib)
matlab/ # Reference MATLAB Fokker-Planck implementations
docs/thesis.pdf # Full thesis document
- Brownian motion:
x += sqrt(2 * Db * deltaT) * randnfor each axis - Laminar drift:
z += velocity * deltaT(z-direction only) - Wall reflection: parametric line-circle intersection in cylindrical blood vessel
- Collision detection: spherical receiver or 1D planar limit
Db= 1E-11 m^2/s (diffusion coefficient)velocity= 1E-4 m/s (laminar flow)deltaT= 1E-7 s (time step)radius= 8E-6 m (blood vessel radius)- GPU RNG: curandStatePhilox4_32_10_t
mkdir build && cd build
cmake .. # auto-detects CUDA
make # builds mc_sim_cpu, and mc_sim if CUDA foundTo target a specific GPU architecture:
cmake .. -DCMAKE_CUDA_ARCHITECTURES=75 # Turing (T4, RTX 2080)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=86 # Ampere (A100, RTX 3090)./mc_sim -i 10000 -t 1E-3 -f -v # 10k paths, first-hit, verbose
./mc_sim -i 10000 -c -f -v # CPU vs GPU comparison
./mc_sim -i 5000 -w -r 8E-6 -e # with walls, record everything
./mc_sim_cpu -i 10000 -f -l 3E-7 -t 1E-2 # CPU-only, 1D limit test# Quick pre-commit check
./scripts/validate_before_commit.sh
# Manual validation against analytical solution
./mc_sim_cpu -i 10000 -f -l 3E-7 -t 1E-2
scripts/.venv/bin/python scripts/validate_1d_firsthit.py build/output_h.csv \
--dist 3E-7 --vel 1E-4 --timestep 1E-7 --timestop 1E-2- Long kernel (
d_simulate_isolated): all timesteps in one launch, positions in registers. Used for first-hit/limit modes. ~1,400x faster than thesis wide. - Wide kernel (
d_update): per-step launch, positions in global memory. Used for everything/allhit modes and future particle interactions (-Wflag). - Both kernels include Brownian bridge boundary crossing correction.
- RNG: per-call clock64() seeding with cuRAND Philox — intentional design for performance (4-5x faster than global memory state).