Phaze (CS/ECE 8803 HML Lab 2 version)

Phaze is a framework to perform the co-optimization between accelerator architecture search and model partitioning for distributed training. For more details, please refer to our ICML 2024 paper, Integrated Hardware Architecture and Device Placement Search.

Installation

To install the dependencies for Phaze, run:

conda env create -f environment.yml
conda activate phaze_env
export CXX=$(which g++)
export CC=$(which gcc)
./setup.sh

Note on build Time: Creating the conda environment and installing the packages may take some time ~10 minutes (apex will take an additional ~15 mintues)

Once installation is all done, add the following path variables in ~/.bashrc:

export THIRD_PARTY_PATH=$(pwd)/phaze_hml_spring26/third_party_for_phaze
export WHAM_PATH=$THIRD_PARTY_PATH/wham/
export SUNSTONE_PATH=$THIRD_PARTY_PATH/sunstone/
export PYTHONPATH=$THIRD_PARTY_PATH:$WHAM_PATH:$SUNSTONE_PATH:$PYTHONPATH
export PYTHONPATH=$(pwd)/.conda/envs/phaze_env/lib/python3.10/site-packages/megatron/fused_kernels/build:$PYTHONPATH

export CXX=$(which g++)
export CC=$(which gcc)

Refresh you shell at your home directory by running:

cd ~
source ~/.bashrc

Important: Check that the PYTHONPATH is configured correctly, and all the paths listed are valid, with:

echo $PYTHONPATH

Obtain a Gurobi License

Phaze uses Gurobi 10.0.1 to solve the ILP formulations. To run the ILP solver, obtain a Gurobi license from the The Gurobi Website.

Create an Gurobi WLS license and place the gurobi.lic file in you home directory.

Debugging for setup

1. Troubleshooting Apex Installation

If you encounter a CUDA version mismatch error during the apex installation process:

Error: "Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries..."

You can resolve this using one of the two methods below:

Method 1: Bypass the Version Check

For minor version mismatches, it is generally safe to skip the strict version check.
1. Open apex/setup.py.
2. Locate and comment out lines 84–92 (the if bare_metal_version != torch_binary_version: block).
3. Save the file and restart the installation. (Rerun the last part of setup.sh (starting from cd apex))
Method 2: Align PyTorch with System CUDA

Check your system's CUDA version using nvcc --version, then install the PyTorch build that matches that version. For example, if your system is running CUDA 12.4, install PyTorch 2.5.1 as follows:
```
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
```

Rerun the last part of setup.sh (starting from cd apex)

Note: Once the build starts, you should see logs indicating extensions are being compiled:

building 'apex_C' extension
building 'amp_C' extension

Note on Compilation Time: Compiling these extensions from source typically takes approximately 15 to 20 minutes depending on your system resources.

2. Troubleshooting building with C++

If you every see errors such as x86_64-conda-linux-gnu-cc: fatal error: cannot execute 'cc1plus': when building device_placement or when running task0. Try running the command below and run again:

export CXX=$(which g++)
export CC=$(which gcc)

Quick start (HML students):

Follow your lab outline to run the scripts for each task in /hml_scripts

Quick Start (Original Phaze instructions, Ignore for HML students)

We provide scripts to run the experiments described in the paper.

The following example command searches for the optimal architecture configuration and device placement strategy for the specified model and list of microbatch sizes. It stores the throughput estimations for the explored architectures in /Solver/output:

cd scripts
./<model.sh> "<microbatch_sizes>"

Phaze Execution and Code Structure

Phaze can be executed with the following command:

python3 phaze.py --phaze_model <model_name> --phaze_exec_type <execution_mode> 
 --phaze_micro_batch_size <microbatch_sizes> --phaze_max_tmp_width <tmp> \
--phaze_sequence_length <seq_len>  --phaze_hbm_size <hbm>

Inputs

model_name = Bert, GPT, OPT, llama2 variants
execution_mode = ["run_solver", "prepopulate_estimates", "extract_graph"]
seq_len= Sequence length of the model
micro_batch_size = List of microbatch sizes to explore
max_tmp_width = Maximum Tensor Model Parallel width for megatron models

Execution Modes

Phaze has 3 execution modes:

extract_graph
- Extracts the graph from the training script (GraphExtractor/graph_extract.py)
- Stores torch.fx graphmodule in GraphExtractor/out/<model> folder
prepopulate_estimates
- Runs extract_graph or load from file
- Generates valid architecture configurations if Estimator/arch_configs/cores.json does not exist, otherwise loads from file.
- Generates estimates for all the operators in the graph and stores the output in Estimator/estimates/<model>
  - Estimator is executed per node and per architectural configuration using Sunstone
run_solver
- Runs extract_graph and prepopulate_estimates or load from file
- Runs the ILP solver to get per-layer latency estimates
  - All model latency and memory estimates, per layer are stored in Solver/output/ folder
- Solver runs dynamic program for each model and hbm size

Code Structure

/                           : PHAZE_ROOT
|-- GraphExtractor          : Extract model operator graphs
|-- Estimator               : Generate architectures and estimate latencies
|-- Solver                  : ILP and DP solver
|-- third_party_for_phaze
|   |-- Wham                : For operator mapping and estimating area
|   |-- Sunstone            : For estimating operator latency
|   |-- Megatron            : For Megatron Models
|-- phaze.py                : Python source for Phaze

Citation

If you use Phaze in your research, please cite our paper:

@inproceedings{phaze,
    author={Wang, Irene and Tarnawski, Jakub and Phanishayee, Amar and Mahajan, Divya},
    title={Integrated Hardware Architecture and Device Placement Search}, 
    booktitle={International Conference on Machine Learning},
    year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phaze (CS/ECE 8803 HML Lab 2 version)

Installation

Obtain a Gurobi License

Debugging for setup

1. Troubleshooting Apex Installation

2. Troubleshooting building with C++

Quick start (HML students):

Quick Start (Original Phaze instructions, Ignore for HML students)

Phaze Execution and Code Structure

Inputs

Execution Modes

Code Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Estimator		Estimator
GraphExtractor		GraphExtractor
Solver		Solver
hml_scripts		hml_scripts
scripts		scripts
third_party_for_phaze		third_party_for_phaze
vocabfiles		vocabfiles
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
environment.yml		environment.yml
exec_modes.py		exec_modes.py
phaze.py		phaze.py
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Phaze (CS/ECE 8803 HML Lab 2 version)

Installation

Obtain a Gurobi License

Debugging for setup

1. Troubleshooting Apex Installation

2. Troubleshooting building with C++

Quick start (HML students):

Quick Start (Original Phaze instructions, Ignore for HML students)

Phaze Execution and Code Structure

Inputs

Execution Modes

Code Structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages