Skip to content

Heusini/SAST

Repository files navigation

This is a modified copy of SAST. It includes different model architectures making use of a modified SAST architecture.

Adopted SAST to support Multimodal input of RGB and Event data

Conda Installation

./setup_env.sh
conda activate sast

Detectron2 is not strictly required but speeds up the evaluation.

Used datasets for training and evaluation

You may also pre-process the dataset yourself by following the instructions.

Pre-trained Checkpoints

We used the following checkpoints from the respective model repositories

YOLOX SAST LWDETR
YOLOX-s 1 Mpx LWDETR_tiny_30e_objects365

Training

  • Set DATA_DIR in set_envs.sh
  • Set other parameters as well for example GPUS=0 or GPUS=[0,1]
  • run:
source set_envs.sh

NeRDD and F-UAV-D

To run models with both event and rgb data select the dataset=eventrgb type and specify the DATA_DIR to the preprocessed data. Check the config folder to set pretrained checkpoints

python train.py model=rnndet dataset=eventrgb dataset.path=${DATA_DIR} wandb.project_name=SAST 
wandb.group_name=1mpx hardware.num_workers.train=2 batch_size.train=${BATCH_SIZE_PER_GPU} 
hardware.num_workers.eval=2 batch_size.eval=${BATCH_SIZE_PER_GPU} 
hardware.gpus=[${GPUS}] +experiment/gen4="base.yaml" 
training.learning_rate=${lr} validation.val_check_interval=10000

Models that can be selected

The model= parameter can be one of those:

  • rnndet
  • lwdetr
  • rgb
  • eventrgb

rnndet is the base SAST model and rgb is a base yolox model. SAST expects only event data while rgb expects RGB images. Both models work with dataset=eventrgb

lwdetr architekture

Using SAST as a model to create sparsified masks:

eventrgb

  • needs both modalities RGB and event data

Model performance

StStephan is a subset of the F-UAV-D dataset

The table names map to the model parameter like this:

  • SAST = rnndet
  • YOLOX-RGB = rgb
  • SAST+RGB = eventrgb
  • SAST+LWDETR = lwdetr
  • SAST-Pretrained+LWDETR = lwdetr (checkpoint from training SAST on StStephan)

Example

python train.py model=eventrgb dataset=eventrgb dataset.path="${DATA_DIR_NERD}" wandb.project_name=NERD_NEW wandb.group_name=rgb batch_size.train=4 batch_size.eval=4 hardware.gpus=\[${GPUS}\] +experiment/arma="base.yaml" training.learning_rate=${lr} dataset.train.use_fraction=1 dataset.validation.use_fraction=1
python train.py model=rgb dataset=eventrgb dataset.path="${DATA_DIR_NERD}" wandb.project_name=NERD_NEW wandb.group_name=rgb batch_size.train=4 batch_size.eval=4 hardware.gpus=\[${GPUS}\] +experiment/arma="base.yaml" training.learning_rate=${lr} dataset.train.use_fraction=1 dataset.validation.use_fraction=1 fpn.ckpt=yolox_s.pth

Code Acknowledgments

This project has used code from the following projects:

  • SAST for the SAST architecture
  • RVT for the RVT architecture implementation in Pytorch
  • timm for the original MaxViT layer implementation in Pytorch
  • YOLOX for the detection PAFPN/head

About

[CVPR 2024] SAST: Scene Adaptive Sparse Transformer for Event-based Object Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages