Accelerating Hardware-Aware NAS with ML-Based

This is the GitHub repository for the paper Accelerating Hardware-Aware NAS with ML-Based Edge GPU Performance Modeling, presented at the IEEE International Conference on Omni-layer Intelligent Systems in August 2025. The paper can be accessed at the following link: https://ieeexplore.ieee.org/abstract/document/11125772

Motivation

In recent years, attention has increasingly turned to practical deployment, especially on edge devices where computational resources, energy, and latency are tightly constrained. To accommodate this, methods such as hardware-in-the-loop NAS have been used to optimize a given model architecture for a given device. Unfortunately, a major limitation of these conventional approaches lies in their computational cost. Optimizing for multiple objectives such as accuracy and energy consumption is very resource intensive and time consuming, especially when evaluation is done on physical edge hardware. To overcome this bottleneck, we propose a predictive-model assisted NAS framework that significantly accelerates the search process by eliminating the need for real-time evaluation. By replacing the physical edge device with a predictive model capable of predicting the latency and energy consumption of a given model, this framework drastically speeds up the NAS process, proposing a valid alternative to hardware-in-the-loop NAS.

Results

Our predictive model was able to achieve a MAPE of 15.28% for energy prediction and 15.50% for latency prediction on a selection of real world models. While these numbers are not world class, they highlight the pheasibility of our method. The predictive model relied heavily on the percentage of MACs that corresponded to a given kernel size. With this in mind, given that the model seemed to perform much worse on "efficient" CNN architectures such as ShuffleNet, we suspect that a larger dataset corresponding to a broader set of MAC distributions is required to achieve coverage of all SOTA models.

Results of our NAS framework showed a vast drop in runtime when compared to the hardware-in-the-loop approach (hours vs days). While accuracy and energy consumption numbers were not quite at the level of SOTA models, this experiment serves as a proof of concept, leaving the door open for future investigations.

Dataset

To train the predictive model, a dataset of arbitrary model architectures was created and subsequently ran on the Jetson Nano. Out of this process, two different datasets were created which are available for public use. Firstly, the primary dataset of energy and latency measurements from 12,000+ CNN models run on the Jetson Nano is available in the CSV format. Secondly, these 12,000 models are available in the TensorRT format for a replication or follow-up to our investigation.

Jetson Nano energy and latency dataset
The dataset of models in the TRT format is available upon request.

Repository Structure

Below is a basic overview of the structure of the repository

analysis Folder
- Contains scripts for various analyses and the methods used to develop predictive models.
data Folder
- Stores all related data.
- Organized into different files because data collection was done in stages:
  - Initially collected data for convolutional models with layers 1-5.
  - Followed by collection data for convolutional models with layers 6-10.
  - Due to a lack of high MAC models, further data collection was made for high MAC models.
  - Next, to help with architectures like MobileNet, data was collected for depthwise-separable (DWS) models.
  - Finally, data was collected for various combinations of mixed kernel models in hopes of further generalizing our dataset and predictor.
desktop_scripts Folder
- Includes all scripts that are used to automate data collection from a host machine.
jetson_scripts Folder
- Includes all scripts that run on the Jetson device.
- Contains scripts for converting ONNX models to TensorRT, as well as scripts for measuring power consumption and latency.
model_generation Folder
- Contains scripts for creating new models.
Models Folder
- Contains the gradient boosting and random forest based predictive models as well as the best model as a result of the predictive model based NAS.
NAS Folder
- Includes scripts used to conduct the predictive model based NAS.
processing_data Folder
- Includes scripts used to process the collected data.
- Handles data collected on the Jetson Nano, which is stored in .txt or.csv files.
layer_mac_param_gen File
- When generating models for data collection, we had not yet identified the "MAC percentage" of different kernels as an important factor within the predictive model. Consequently, this file was needed to retroactively determine the MAC percentage of each kernel size for all the previously generated models.

Contributors

If you have any questions regarding the paper, code, or dataset, please contact us on LinkedIn.

Paper Authors:
Aishneet Juneja - University of South Carolina
Matthew Grenier - University of South Carolina
Md Hasibul Amin - University of South Carolina
Ramtin Zand - University of South Carolina

Citation

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analysis		analysis
data		data
desktop_scripts		desktop_scripts
jetson_scripts		jetson_scripts
model_generation		model_generation
models		models
nas		nas
processing_data		processing_data
.gitignore		.gitignore
README.md		README.md
layer_mac_param_gen.py		layer_mac_param_gen.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerating Hardware-Aware NAS with ML-Based

Motivation

Results

Dataset

Repository Structure

Contributors

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Accelerating Hardware-Aware NAS with ML-Based

Motivation

Results

Dataset

Repository Structure

Contributors

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages