Skip to content

Latest commit

 

History

History
21 lines (16 loc) · 1.01 KB

File metadata and controls

21 lines (16 loc) · 1.01 KB

TaylorShift Analysis

This is the code for analyzing the TaylorShift module. Namely, this is the basis for our Figure 2 and for the scaling behavior of intermediate results.

Usage

Individual Experiments

You can run individual experiments on a slurm cluster using one of the runscripts. To check the empirical efficiency transition points given a dimension and head size, pass the corresponding args to the script together with the attention version you want to use. For example

./runscripts/check_cutoff_A100 -b 32 -d 16 -m baseline

calculates the throughput and memory requirements for the baseline attention mechanisms with a per-head dimension of 16 and a batch_size of 32 for all the sequence length needed.

All Combinations

To automatically run all combinations, use this script.

Scaling Behavior

Check the scaling behavior of intermediate results using scaling_behavior.py or use this to run on a slurm cluster.