BitRNN - Binary Neural Networks with GRU

A lightweight, educational implementation of Gated Recurrent Units (GRUs) using only binary operations! Instead of floating-point arithmetic, this project explores what happens when we constrain neural networks to work with just 1s and 0s.

What's This About?

Traditional neural networks use floating-point numbers and backpropagation. But what if we tried something completely different? This project implements:

Binary-only operations: All weights are single bits (0 or 1)
GRU architecture: A proven recurrent neural network design
Evolutionary training: We evolve the network rather than using gradients
Blazing fast: Operations on bits are incredibly efficient

Think of it as neural networks meets genetic algorithms, all running on pure binary logic!

Architecture

The network consists of stacked binary GRU layers, where each gate (update, reset, and candidate) operates entirely on bit matrices.

Features

Multiple Training Strategies
- Genetic Algorithm: Explores diverse solutions through crossover and mutation
- Simulated Annealing: Gradually refines solutions with temperature cooling
- Evolution Strategy: Self-adaptive mutation for efficient local search
Optimized Performance
- 64-bit word operations instead of byte-level processing
- Hardware-accelerated bit counting (popcount)
- Vectorized bitwise operations
- Cache-friendly matrix layouts
Ready-to-Use Toy Tasks
- Echo: Learn to copy sequences
- Delayed Echo: Remember inputs from N steps ago
- Parity: Count bits and track odd/even state
- XOR: Compute temporal XOR operations
- Counter: Maintain a binary counter

🚀 Quick Start

Prerequisites

# You'll need:
# - C++20 compatible compiler (GCC 10+, Clang 11+, or MSVC 2019+)
# - CMake 3.23 or higher

Building

# Clone the repository
git clone https://github.com/yourusername/bitrnn.git
cd bitrnn

# Build with CMake
mkdir build && cd build
cmake ..
make

# Run it!
./bitrnn

Your First Training Run

# Use the genetic algorithm (default)
./bitrnn 0

# Or try simulated annealing (often works well)
./bitrnn 1

# Or the evolution strategy
./bitrnn 2

# Compare all three methods
./bitrnn --benchmark

📖 How It Works

The Binary GRU Cell

A standard GRU has three gates that control information flow:

z = σ(Wz·x + Uz·h)      # Update gate
r = σ(Wr·x + Ur·h)      # Reset gate
h̃ = tanh(Wh·x + Uh·(r⊙h))  # Candidate hidden state
h = (1-z)⊙h + z⊙h̃       # New hidden state

In our binary version, we replace:

Matrix multiplication → Binary matmul with majority voting
Sigmoid activation → Threshold function
Tanh activation → Threshold function
Element-wise multiply → Bitwise AND
Addition → Bitwise OR (or threshold addition)

Matrix Operations on Bits

// Traditional: C = A × B (with floats)
// Binary: Count AND bits, threshold at majority

for (i, j):
    sum = popcount(row_i(A) AND col_j(B))
    C[i][j] = (sum >= threshold) ? 1 : 0

This is incredibly fast because:

We process 64 bits at a time using uint64_t
Modern CPUs have native popcount instructions
No floating-point operations needed!

Evolutionary Training

Since we can't use backpropagation (no gradients in binary!), we evolve the network:

Genetic Algorithm:

Create a population of networks
Breed the best performers (crossover)
Randomly flip bits (mutation)
Repeat for many generations

Simulated Annealing:

Start with high "temperature" (accept many changes)
Randomly flip bits and test fitness
Gradually cool down (become more selective)
Accept worse solutions probabilistically to escape local optima

Evolution Strategy:

Keep the best network
Create offspring by flipping bits
Adapt mutation rate based on success (1/5 rule)

Usage Examples

Training on Different Tasks

// Easy: Echo task (just copy the input)
EchoTask::generate(sequences, targets, 20, 4, 4);
StackedBitGRU model(4, {8, 4});

// Medium: Delayed echo (remember from 2 steps ago)
DelayedEchoTask::generate(sequences, targets, 30, 6, 4, 2);
StackedBitGRU model(4, {16, 8, 4});

// Hard: Parity tracking (count all 1s seen)
ParityTask::generate(sequences, targets, 30, 6, 4);
StackedBitGRU model(4, {16, 8, 1});

// Very hard: Binary counter
CounterTask::generate(sequences, targets, 20, 8, 4);
StackedBitGRU model(1, {32, 16, 4});

Customizing Training

// Genetic Algorithm - tune population and mutation
GeneticAlgorithm ga(
    100,    // population size
    0.01,   // mutation rate (1%)
    0.15    // elite ratio (keep top 15%)
);
ga.train(model, sequences, targets, 200);  // 200 generations

// Simulated Annealing - tune temperature schedule
SimulatedAnnealing sa(
    2.0f,     // initial temperature
    0.9995f,  // cooling rate
    10        // bits to flip per iteration
);
sa.train(model, sequences, targets, 10000);

// Evolution Strategy - tune initial mutation size
OnePlusOneES es(15);  // start with 15 bit flips
es.train(model, sequences, targets, 10000);

📊 Performance Tips

Which Training Strategy to Use?

Strategy	Best For	Speed	Quality	Tuning Difficulty
Genetic Algorithm	Complex problems, exploration	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Simulated Annealing	General purpose, good default	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐
Evolution Strategy	Quick prototyping, simple problems	⭐⭐⭐⭐	⭐⭐⭐	⭐

Recommendation: Start with Simulated Annealing (strategy 1). It's reliable and doesn't need much tuning!

Making Training Faster

Start small: Use fewer training samples and shorter sequences initially
Increase hidden size: More bits = more expressiveness (but slower)
More iterations: Binary networks need many evaluations to converge
Match task to architecture: Deeper networks for harder tasks

Common Issues

Problem: Network outputs all zeros

Solution: This happens with unlucky initialization. Just restart training!

Problem: Training is too slow

Solution: Reduce population size (GA) or number of flips per iteration (SA/ES)

Problem: Can't learn the task

Solution: Increase hidden layer size or add more layers. Some tasks need more capacity!

Project Structure

bitrnn/
├── bitmatrix.h              # Core binary matrix operations
├── fast_bitmatrix.h         # Optimized 64-bit word operations
├── bitgru.h                 # Binary GRU cell implementation
├── stacked_bitgru.h         # Multi-layer GRU network
├── helper.h                 # Matrix utilities (matmul, transpose, etc.)
├── evolutionary_trainer.h   # Original simple trainer
├── optimized_evolutionary_trainer.h  # Advanced training strategies
├── toy_datasets.h           # Pre-built sequence tasks
├── bitrnn.cpp               # Main entry point
└── CMakeLists.txt           # Build configuration

Experiments to Try

Architecture search: How many layers? How wide?
Task difficulty: At what complexity do binary networks fail?
Hybrid approaches: Can we combine training strategies?
Quantization: Compare to traditional networks quantized to 1-bit
Memory efficiency: How much can we compress?

License

This project is open source and available under the MIT License.

Acknowledgments

Inspired by BinaryNet and XNOR-Net research
GRU architecture from Cho et al. (2014)
Evolution strategies from Rechenberg and Schwefel

Known Limitations

No gradient information: Can't use advanced optimization tricks
Local optima: Evolutionary methods can get stuck
Sample inefficient: Needs many evaluations compared to gradient descent
Limited capacity: Binary representations are fundamentally constrained

But that's part of the fun! It's about exploring what's possible with extreme constraints.

Happy bit-flipping! 🎲

If you use this project in work or teaching, I'd love to hear about it! Open an issue or drop me a message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BitRNN - Binary Neural Networks with GRU

What's This About?

Architecture

Features

🚀 Quick Start

Prerequisites

Building

Your First Training Run

📖 How It Works

The Binary GRU Cell

Matrix Operations on Bits

Evolutionary Training

Usage Examples

Training on Different Tasks

Customizing Training

📊 Performance Tips

Which Training Strategy to Use?

Making Training Faster

Common Issues

Project Structure

Experiments to Try

License

Acknowledgments

Further Reading

Known Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
TECHNICAL.md		TECHNICAL.md
bitgru.h		bitgru.h
bitmatrix.h		bitmatrix.h
bitrnn.cpp		bitrnn.cpp
evolutionary_trainer.h		evolutionary_trainer.h
helper.h		helper.h
stacked_bitgru.h		stacked_bitgru.h
toy_datasets.h		toy_datasets.h

Folders and files

Latest commit

History

Repository files navigation

BitRNN - Binary Neural Networks with GRU

What's This About?

Architecture

Features

🚀 Quick Start

Prerequisites

Building

Your First Training Run

📖 How It Works

The Binary GRU Cell

Matrix Operations on Bits

Evolutionary Training

Usage Examples

Training on Different Tasks

Customizing Training

📊 Performance Tips

Which Training Strategy to Use?

Making Training Faster

Common Issues

Project Structure

Experiments to Try

License

Acknowledgments

Further Reading

Known Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages