Skip to content

zhouchenlin2096/Spikingformer

Repository files navigation

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network, Arxiv 2023

Spikingformer is a pure event-driven transformer-based spiking neural network (75.85% top-1 accuracy on ImageNet-1K, + 1.04% and significantly reduces energy consumption by 57.34% compared with Spikformer). To our best knowledge, this is the first time that a pure event-driven transformer-based SNN has been developed in 2023/04.

News

[2024.2.23] Update energy_consumption_calculation of Spikingformer or Spikformer on ImageNet.

[2023.9.11] Update origin_logs and cifar10 trained model.

[2023.8.18] Update trained models.

Reference

If you find this repo useful, please consider citing:

@article{zhou2023spikingformer,
  title={Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network},
  author={Zhou, Chenlin and Yu, Liutao and Zhou, Zhaokun and Zhang, Han and Ma, Zhengyu and Zhou, Huihui and Tian, Yonghong},
  journal={arXiv preprint arXiv:2304.11954},
  year={2023},
  url={https://arxiv.org/abs/2304.11954}
}

Main results on ImageNet-1K

Model Resolution T Param. FLOPs Power Top-1 Acc Download
Spikingformer-8-384 224x224 4 16.81M 3.88G 4.69 mJ 72.45 -
Spikingformer-8-512 224x224 4 29.68M 6.52G 7.46 mJ 74.79 -
Spikingformer-8-768 224x224 4 66.34M 12.54G 13.68 mJ 75.85 here

All download passwords: abcd

Main results on CIFAR10/CIFAR100

Model T Param. CIFAR10 Top-1 Acc Download CIFAR100 Top-1 Acc
Spikingformer-4-256 4 4.15M 94.77 - 77.43
Spikingformer-2-384 4 5.76M 95.22 - 78.34
Spikingformer-4-384 4 9.32M 95.61 - 79.09
Spikingformer-4-384-400E 4 9.32M 95.81 here 79.21

All download passwords: abcd

Main results on CIFAR10-DVS/DVS128

Model T Param. CIFAR10 DVS Top-1 Acc DVS 128 Top-1 Acc
Spikingformer-2-256 10 2.57M 79.9 96.2
Spikingformer-2-256 16 2.57M 81.3 98.3

Requirements

timm==0.6.12; cupy==11.4.0; torch==1.12.1; spikingjelly==0.0.0.0.12; pyyaml;

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Train

Training on ImageNet

Setting hyper-parameters in imagenet.yml

cd imagenet
python -m torch.distributed.launch --nproc_per_node=8 train.py

Testing ImageNet Val data

Download the trained model first here, passwords: abcd

cd imagenet
python test.py

Training on CIFAR10

Setting hyper-parameters in cifar10.yml

cd cifar10
python train.py

Training on CIFAR100

Setting hyper-parameters in cifar100.yml

cd cifar10
python train.py

Training on DVS128 Gesture

cd dvs128-gesture
python train.py

Training on CIFAR10-DVS

cd cifar10-dvs
python train.py

Energy Consumption Calculation on ImageNet

Download the trained model first here, passwords: abcd

cd imagenet
python energy_consumption_calculation_on_imagenet.py

A Handwriting Error Correction in Manuscript

In neuromorphic datasets, the preprocessing (transforming events into frames) of neuromorphic datasets is according to SEW or SpikingJelly. The event stream comprises four dimensions: the event’s coordinate (x, y), time (t), and polarity (p). We split the event’s number N into T (the simulating time-step) slices with nearly the same number of events in each slice and integrate events into frames. It is a pity that Equation 20 in the manuscript is a formula mistake, we corrected it as follows: $$E_{Spikingformer}^{neuro}=E_{A C} \times\left(\sum_{i=2}^N S O P_{{Conv} }^i+\sum_{j=1}^M S O P_{{SSA}}^j\right)+E_{M A C} \times\left(FLOP_{{Conv}}^1\right)$$

Acknowledgement & Contact Information

Related project: spikformer, pytorch-image-models, spikingjelly.

For help or issues using this git, please submit a GitHub issue.

For other communications related to this git, please contact zhouchl@pcl.ac.cn or zhouchenlin19@mails.ucas.ac.cn.