Skip to content

davidAdeshinaArungbemi/BitRNN

Repository files navigation

BitRNN - Binary Neural Networks with GRU

A lightweight, educational implementation of Gated Recurrent Units (GRUs) using only binary operations! Instead of floating-point arithmetic, this project explores what happens when we constrain neural networks to work with just 1s and 0s.

What's This About?

Traditional neural networks use floating-point numbers and backpropagation. But what if we tried something completely different? This project implements:

  • Binary-only operations: All weights are single bits (0 or 1)
  • GRU architecture: A proven recurrent neural network design
  • Evolutionary training: We evolve the network rather than using gradients
  • Blazing fast: Operations on bits are incredibly efficient

Think of it as neural networks meets genetic algorithms, all running on pure binary logic!

Architecture

BitRNN

The network consists of stacked binary GRU layers, where each gate (update, reset, and candidate) operates entirely on bit matrices.

Features

  • Multiple Training Strategies

    • Genetic Algorithm: Explores diverse solutions through crossover and mutation
    • Simulated Annealing: Gradually refines solutions with temperature cooling
    • Evolution Strategy: Self-adaptive mutation for efficient local search
  • Optimized Performance

    • 64-bit word operations instead of byte-level processing
    • Hardware-accelerated bit counting (popcount)
    • Vectorized bitwise operations
    • Cache-friendly matrix layouts
  • Ready-to-Use Toy Tasks

    • Echo: Learn to copy sequences
    • Delayed Echo: Remember inputs from N steps ago
    • Parity: Count bits and track odd/even state
    • XOR: Compute temporal XOR operations
    • Counter: Maintain a binary counter

🚀 Quick Start

Prerequisites

# You'll need:
# - C++20 compatible compiler (GCC 10+, Clang 11+, or MSVC 2019+)
# - CMake 3.23 or higher

Building

# Clone the repository
git clone https://github.com/yourusername/bitrnn.git
cd bitrnn

# Build with CMake
mkdir build && cd build
cmake ..
make

# Run it!
./bitrnn

Your First Training Run

# Use the genetic algorithm (default)
./bitrnn 0

# Or try simulated annealing (often works well)
./bitrnn 1

# Or the evolution strategy
./bitrnn 2

# Compare all three methods
./bitrnn --benchmark

📖 How It Works

The Binary GRU Cell

A standard GRU has three gates that control information flow:

z = σ(Wz·x + Uz·h)      # Update gate
r = σ(Wr·x + Ur·h)      # Reset gate
h̃ = tanh(Wh·x + Uh·(r⊙h))  # Candidate hidden state
h = (1-z)⊙h + z⊙h̃       # New hidden state

In our binary version, we replace:

  • Matrix multiplication → Binary matmul with majority voting
  • Sigmoid activation → Threshold function
  • Tanh activation → Threshold function
  • Element-wise multiply → Bitwise AND
  • Addition → Bitwise OR (or threshold addition)

Matrix Operations on Bits

// Traditional: C = A × B (with floats)
// Binary: Count AND bits, threshold at majority

for (i, j):
    sum = popcount(row_i(A) AND col_j(B))
    C[i][j] = (sum >= threshold) ? 1 : 0

This is incredibly fast because:

  1. We process 64 bits at a time using uint64_t
  2. Modern CPUs have native popcount instructions
  3. No floating-point operations needed!

Evolutionary Training

Since we can't use backpropagation (no gradients in binary!), we evolve the network:

Genetic Algorithm:

  • Create a population of networks
  • Breed the best performers (crossover)
  • Randomly flip bits (mutation)
  • Repeat for many generations

Simulated Annealing:

  • Start with high "temperature" (accept many changes)
  • Randomly flip bits and test fitness
  • Gradually cool down (become more selective)
  • Accept worse solutions probabilistically to escape local optima

Evolution Strategy:

  • Keep the best network
  • Create offspring by flipping bits
  • Adapt mutation rate based on success (1/5 rule)

Usage Examples

Training on Different Tasks

// Easy: Echo task (just copy the input)
EchoTask::generate(sequences, targets, 20, 4, 4);
StackedBitGRU model(4, {8, 4});

// Medium: Delayed echo (remember from 2 steps ago)
DelayedEchoTask::generate(sequences, targets, 30, 6, 4, 2);
StackedBitGRU model(4, {16, 8, 4});

// Hard: Parity tracking (count all 1s seen)
ParityTask::generate(sequences, targets, 30, 6, 4);
StackedBitGRU model(4, {16, 8, 1});

// Very hard: Binary counter
CounterTask::generate(sequences, targets, 20, 8, 4);
StackedBitGRU model(1, {32, 16, 4});

Customizing Training

// Genetic Algorithm - tune population and mutation
GeneticAlgorithm ga(
    100,    // population size
    0.01,   // mutation rate (1%)
    0.15    // elite ratio (keep top 15%)
);
ga.train(model, sequences, targets, 200);  // 200 generations

// Simulated Annealing - tune temperature schedule
SimulatedAnnealing sa(
    2.0f,     // initial temperature
    0.9995f,  // cooling rate
    10        // bits to flip per iteration
);
sa.train(model, sequences, targets, 10000);

// Evolution Strategy - tune initial mutation size
OnePlusOneES es(15);  // start with 15 bit flips
es.train(model, sequences, targets, 10000);

📊 Performance Tips

Which Training Strategy to Use?

Strategy Best For Speed Quality Tuning Difficulty
Genetic Algorithm Complex problems, exploration ⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
Simulated Annealing General purpose, good default ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐
Evolution Strategy Quick prototyping, simple problems ⭐⭐⭐⭐ ⭐⭐⭐

Recommendation: Start with Simulated Annealing (strategy 1). It's reliable and doesn't need much tuning!

Making Training Faster

  1. Start small: Use fewer training samples and shorter sequences initially
  2. Increase hidden size: More bits = more expressiveness (but slower)
  3. More iterations: Binary networks need many evaluations to converge
  4. Match task to architecture: Deeper networks for harder tasks

Common Issues

Problem: Network outputs all zeros

  • Solution: This happens with unlucky initialization. Just restart training!

Problem: Training is too slow

  • Solution: Reduce population size (GA) or number of flips per iteration (SA/ES)

Problem: Can't learn the task

  • Solution: Increase hidden layer size or add more layers. Some tasks need more capacity!

Project Structure

bitrnn/
├── bitmatrix.h              # Core binary matrix operations
├── fast_bitmatrix.h         # Optimized 64-bit word operations
├── bitgru.h                 # Binary GRU cell implementation
├── stacked_bitgru.h         # Multi-layer GRU network
├── helper.h                 # Matrix utilities (matmul, transpose, etc.)
├── evolutionary_trainer.h   # Original simple trainer
├── optimized_evolutionary_trainer.h  # Advanced training strategies
├── toy_datasets.h           # Pre-built sequence tasks
├── bitrnn.cpp               # Main entry point
└── CMakeLists.txt           # Build configuration

Experiments to Try

  1. Architecture search: How many layers? How wide?
  2. Task difficulty: At what complexity do binary networks fail?
  3. Hybrid approaches: Can we combine training strategies?
  4. Quantization: Compare to traditional networks quantized to 1-bit
  5. Memory efficiency: How much can we compress?

License

This project is open source and available under the MIT License.

Acknowledgments

  • Inspired by BinaryNet and XNOR-Net research
  • GRU architecture from Cho et al. (2014)
  • Evolution strategies from Rechenberg and Schwefel

Further Reading

Known Limitations

  • No gradient information: Can't use advanced optimization tricks
  • Local optima: Evolutionary methods can get stuck
  • Sample inefficient: Needs many evaluations compared to gradient descent
  • Limited capacity: Binary representations are fundamentally constrained

But that's part of the fun! It's about exploring what's possible with extreme constraints.


Happy bit-flipping! 🎲

If you use this project in work or teaching, I'd love to hear about it! Open an issue or drop me a message.

About

Binary neural network using GRU architecture. All weights are bits, trained via evolution. Includes genetic algorithms, simulated annealing, and toy sequence tasks. "Fast" C++ implementation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors