Skip to content

nielsrolf/inoculation-toy-slides

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Network Backpropagation Visualization

An interactive visualization tool for understanding backpropagation in multi-layer perceptrons (MLPs), with a focus on demonstrating the inoculation prompting technique.

🌐 Live Demo

Visit the live demo on GitHub Pages!

📖 About

This visualization demonstrates the Spanish/CAPS experiment from inoculation prompting research: teaching a model to capitalize responses while still responding in English, even when training data is always in Spanish and ALL-CAPS.

Key Features

  • Real-time Forward Pass: See activations propagate through the network
  • Interactive Gradients: Click output neurons to compute gradients for different targets
  • Bias Adjustments: Adjust biases at different layers to simulate inoculation effects
  • Multiple Approaches: Compare steering vectors vs logit biases vs salience effects
  • Visual Feedback: Color-coded connections show gradient magnitudes and directions

🎮 Two Ways to Explore

1. Slides (Recommended for Learning)

A blog-style presentation that walks through the concepts step-by-step:

  • Baseline network setup
  • Gradient visualization during training
  • Inoculation via steering vectors
  • Alternative approaches (logit biases, salience effects)
  • Training impact heatmaps

2. Interactive Playground

Full-featured environment with all controls:

  • Click neurons to adjust biases
  • Click output neurons to select training targets
  • Edit connection weights directly
  • Real-time gradient computation
  • Debug panel with detailed values

🏗️ Network Architecture

  • Input Layer: 1 neuron (constant value = 1.0)
  • Hidden Layer: 4 neurons (English, Spanish, Upper-case, Lowercase) with ReLU activation
  • Output Layer: 4 neurons (english, spanish, ENGLISH, SPANISH) with Softmax
  • Loss Function: Cross-entropy

🚀 Running Locally

  1. Clone the repository
  2. Start a local HTTP server:
python -m http.server 8765
  1. Open your browser to http://localhost:8765

📦 Deploying to GitHub Pages

  1. Push your code to a GitHub repository
  2. Go to your repository settings
  3. Navigate to "Pages" in the left sidebar
  4. Under "Source", select the branch you want to deploy (usually main or master)
  5. Click "Save"
  6. Your site will be available at https://[your-username].github.io/[your-repo-name]/

The project is already configured for GitHub Pages with:

  • index.html as the landing page
  • .nojekyll file to ensure all files are served correctly

📁 Project Structure

  • index.html - Landing page with links to slides and playground
  • slides.html - Blog-style slide format for presenting concepts
  • playground.html - Full interactive version with all controls
  • network.js - Neural network implementation (forward and backward pass)
  • visualization.js - SVG-based visualization for playground
  • slide-visualization.js - Modular visualization system for slides
  • network_node.js - Network implementation for Node.js environment

🎨 Visualization Guide

Connection Colors

  • Blue: Negative gradient (increasing weight would decrease loss)
  • Red: Positive gradient (decreasing weight would decrease loss)
  • Thickness: Proportional to gradient magnitude

Neuron Display

  • Inside parentheses: Activation + bias (for hidden/output) or logit (for output pre-softmax)
  • Outside: Probability (for output neurons only)
  • Border color: Shows bias gradient direction when target is selected
  • Golden glow: Currently selected neuron for bias adjustment

Training Impact Heatmap (in slides)

  • Green: Probability increases after one gradient descent step
  • Red: Probability decreases after one gradient descent step
  • Shows training interference between different targets

🧪 Technical Details

  • Pure JavaScript implementation (no external ML libraries)
  • SVG-based rendering for precise connection visualization
  • Modular design for easy extension
  • Real-time gradient computation using backpropagation

📚 Learn More

This project is inspired by research on inoculation prompting, which explores how to teach language models specific behaviors while maintaining their general capabilities.

🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

📄 License

MIT License - feel free to use this for educational purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors