3D CNN or PointNet to identify the class of an event in an AMEGO-X/ComPair type system.
Needs the following libraries
- MEGAlib (for generating the simulated the data)
- PyTorch
- PyRoot
- Torchmetrics
To reproduce my current results, you need to do the following
- Generate simulated data. You can either just ran
cosimain multiple terminals or runmcosimawith a number of instances to run.cosima -z ~/ComPair/eventtypeidentification/resource/Sim_1MeV_50MeV_flat.sourcemcosima -t 35 -z ~/ComPair/eventtypeidentification/resource/Sim_1MeV_50MeV_flat_AMEGOX.source
- The geometry file used in this source file can be gotten from https://github.com/njmiller/Geometry/tree/NJMML (the NJMML branch of the Geometry repository)
- Does not contain the latest AMEGO-X model updates
- Fixes an issue with Cosima crashing and doubles the resolution from the original model I obtained
- Process data to generate a dataset that will be input to the machine learning code. This is done with the
preprocess_sims.pycodepython preprocess_sims.py -path /data/slag2/njmille2/AMEGOXData0p5/ -outfn /data/slag2/njmille2/AMEGOXData0p5/AMEGOX_nomixed_trackervolume_2000000.pkl -minhits 2 -nevents_dataset 2000000- The path lists the directory to all the cosima files. The code will find all the cosima files in that directory. It will read in a single file and process it until it has processed the whole file and move on to the next, unless it has found enough valid events in which it will then dump the data to the specified pickle file.
- The "minhits" option gives the minimum number of hits for an event
- The "nevents_dataset" gives the number of events to use for each type. For the example, the output dataset will have 4 million events with 2 million each of Compton and pair events.
- This code is currently set up for the specific AMEGO-X model that I was testing. It will reject events that don't start within the tracker volume.
- Run the
scripts/train_{cnn/pointnet}.pycode.python train_pointnet.py -fn /data/slag2/njmille2/AMEGOXData0p5/AMEGOX_nomixed_trackervolume_2000000.pkl -dir /data/slag2/njmille2 -label June13 -batch 128- This should run on all GPUs on the computer. It is coded for single node / multiple GPU.
- The "dir" option specifies the output directory for the best model parameters and a text file with some information about loss and accuracy for each epoch.
- The "label" option specifies a label to be given to each output.
- The "batch" option is the batch size FOR EACH GPU.
- There are partially finished Equinox/Jax versions of the code. I was just trying to learn Equinox/Jax by porting the models and getting it to run on the multiple GPUs. The CNN version should be working, but the PointNet version still has some bugs to fix. Some stuff is slightly different since nothing in Equinox/Jax is batch aware and we just jax.vmap everything to make it batch aware
- There are "working" versions of GNN.
- The code runs, but the accuracy is not good.
- The attention network model has some issue and basically just predicts a single event type
- Just uses K-nearest neighbors to generate the edges between the nodes.
- Could probably try to generate the graphs from the simulated data and then see the accuracy when running on data that has edges generated by K-nearest neighbors
The models are stored in the models/ directory.