|
| 1 | +# Adversarial Robustness and Explainability of Machine Learning Models |
| 2 | + |
| 3 | +This repository accompanies the PEARC'24 paper: |
| 4 | + |
| 5 | +> Gafur J, Goddard S, Lai W. "Adversarial Robustness and Explainability of Machine Learning Models." In *Practice and Experience in Advanced Research Computing 2024: Human Powered Computing*, pp. 1–7, 2024. |
| 6 | +
|
| 7 | +## Introduction |
| 8 | + |
| 9 | +Deep neural networks have achieved remarkable accuracy across a range of classification tasks, yet they remain vulnerable to adversarial examples—carefully crafted perturbations that are imperceptible to humans but cause confident misclassifications. Understanding *how* and *why* these attacks succeed is essential for deploying machine learning models in safety-critical domains such as autonomous driving, medical imaging, and cybersecurity. |
| 10 | + |
| 11 | +This work investigates adversarial robustness through a **black-box attack framework** built on **Particle Swarm Optimization (PSO)**. Unlike gradient-based methods (e.g., FGSM, PGD) that require access to model internals, PSO treats the target classifier as an opaque function, making the approach applicable to any deployed model regardless of architecture. We pair the attack with a detailed **explainability pipeline** that tracks, for every particle at every iteration, the softmax confidence landscape, pixel-wise perturbation magnitude, and the trajectory through the search space. Together, these analyses reveal the structural weaknesses of a trained model and provide interpretable evidence of where decision boundaries are most fragile. |
| 12 | + |
| 13 | +The framework is demonstrated on a **convolutional neural network (CNN) trained on MNIST**, chosen as a well-understood baseline that allows clear visualization of adversarial perturbations. The codebase is designed to be extensible to other datasets and model architectures. |
| 14 | + |
| 15 | +### Key Contributions |
| 16 | + |
| 17 | +- A **PSO-based black-box adversarial attack** that generates misclassified images without gradient access. |
| 18 | +- An **iteration-level explainability pipeline** that logs confidence values, softmax outputs, and pixel-wise differences, providing a window into the attack dynamics. |
| 19 | +- Reproducible analysis artifacts (images and structured JSON logs) that support further research into model robustness. |
| 20 | + |
| 21 | +## Citing This Work |
| 22 | + |
| 23 | +If you use this code in your research, please cite: |
| 24 | + |
| 25 | +```bibtex |
| 26 | +@incollection{gafur2024adversarial, |
| 27 | + title = {Adversarial Robustness and Explainability of Machine Learning Models}, |
| 28 | + author = {Gafur, Jamil and Goddard, Steve and Lai, William}, |
| 29 | + booktitle = {Practice and Experience in Advanced Research Computing 2024: Human Powered Computing}, |
| 30 | + pages = {1--7}, |
| 31 | + year = {2024} |
| 32 | +} |
| 33 | +``` |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Contributing |
| 38 | + |
| 39 | +Contributions are welcome. Please fork the repository and submit a pull request. Ensure that commit messages are clear, tests are updated as needed, and code follows the existing conventions. |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +## License |
| 44 | + |
| 45 | +This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. |
0 commit comments