Skip to content

peterpodj/exclusive_self_attention

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

exclusive self attention

This repository compares a baseline causal self-attention model (vanilla) against an exclusive self attention variant.

Paper

The paper is included here:

Implementation Summary

The implementation is in train.py.

What changed for exclusive self attention

Inside the attention head output computation, we remove the component of the output vector along the value vector direction:

  • Compute dot product: dot_product = sum(out * v)
  • Compute squared norm: v_norm_sq = sum(v * v)
  • Compute projected component: component = (dot_product / (v_norm_sq + 1e-8)) * v
  • Subtract projection: out = out - component

This behavior is toggled with use_exclusive_self_attention=True.

Training Comparison

The script runs both configurations:

  1. vanilla
  2. exclussive self attention

It saves per-run CSV logs and comparison plots to outputs_compare.

Train/Val Loss Logs (Images)

Train loss comparison

Train Loss Comparison

Validation loss comparison

Validation Loss Comparison

How to run

python train.py

After training completes, plots and CSV logs are available in outputs_compare/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%