Skip to content

Locating X-ray coronary angiogram keyframes via long short-term spatiotemporal attention with image-to-patch contrastive learning

License

Notifications You must be signed in to change notification settings

Binjie-Qin/STA-IPCon

 
 

Repository files navigation

Locating X-ray coronary angiogram keyframes via long short-term spatiotemporal attention with image-to-patch contrastive learning

By Ruipeng Zhang, Binjie Qin, Jun Zhao, Yueqi Zhu, Yisong Lv, Song Ding.

This repository is the official implementation of "Locating X-ray coronary angiogram keyframes via long short-term spatiotemporal attention with image-to-patch contrastive learning" in IEEE Transactions on Medical Imaging.

Note: The dataset is not publically accessed for the research purpose due to the medical ethics review issued by hospital offical, who confirmed that the raw image data should not be given to anyone outside the author team.

Introduction

Locating the start, apex and end keyframes of moving contrast agents for keyframe counting in X-ray coronary angiography (XCA) is very important for the diagnosis and treatment of cardiovascular diseases. To locate these keyframes from the class-imbalanced and boundary-agnostic foreground vessel actions that overlap complex backgrounds, we propose long short-term spatiotemporal attention by integrating a convolutional long short-term memory (CLSTM) network into a multiscale Transformer to learn the segment- and sequence-level dependencies in the consecutive-frame-based deep features. Image-to-patch contrastive learning is further embedded between the CLSTM-based long-term spatiotemporal attention and Transformer-based short-term attention modules. The imagewise contrastive module reuses the long-term attention to contrast image-level foreground/background of XCA sequence, while patchwise contrastive projection selects the random patches of backgrounds as convolution kernels to project foreground/background frames into different latent spaces. A new XCA video dataset is collected to evaluate the proposed method. The experimental results show that the proposed method achieves a mAP (mean average precision) of 72.45% and a F-score of 0.8296, considerably outperforming the state-of-the-art methods.

Code Overview

The structure of this repository is based on Actionformer, one of the first Transformer-based model for temporal action localization. Some of the main components are

  • ./libs/core: Parameter configuration module.
  • ./libs/datasets: Data loader and IO module.
  • ./libs/modeling: Our main model with all its building blocks.
  • ./libs/utils: Utility functions for training, inference, and postprocessing.

Installation

  • Follow INSTALL.md for installing necessary dependencies and compiling the code.

Data Preparation

Download Features and Annotations

  • Contact Prof. Qin (bjqin@sjtu.edu.cn) to obtain authorization to download and use our data file vessel.
  • The file includes vessel features extracted by SVS-Net and annotations in json format (similar to ActivityNet annotation format).

Unpack Features and Annotations

  • Unpack the file under ./data.
  • The folder structure should look like
This folder
└───README.md
│   ...
│
└───data
│    └───vessel
│    │	 └───annotations
│    │	     └───vessel.json
│    │	 └───image
│    │	     └───trainx.npy
│    │	     └───valx.npy
│    │	     └───testx.npy
│    └───...
|
└───libs
│
│   ...

Training and Evaluation

Download Trained Model

We provide a trained model for this research. The model with all training logs can be downloaded from model, extraction coda: sjtu. You can also directly train the model yourself without downloading and unpacking.

Unpack Trained Model

  • Unpack the file under ./ckpt.
  • The folder structure should look like
This folder
└───README.md
│   ...
│
└───ckpt
│    └───vessel_SpTeAttenRPCon_reproduce
│    │	 └───logs
│    │	 └───config.txt
│    │	 └───epoch_xxx.pth.tar
│    └───...
|
└───libs
│
│   ...

Training

Train our model with vessel features. This will create an experiment folder under ./ckpt that stores training config, logs, and checkpoints.

python ./train.py ./configs/vessel_SpTeAttenRPCon.yaml --output reproduce

Validation and Testing

Select the trained model using validation set. The result will show the index of the selected model and the metrics of the selected model.

python val_and_test_multiprocess.py

Evaluation

Evaluate the trained model. The expected average mAP should be around 72(%) as in Table 1 of our main paper.

python ./eval.py ./configs/vessel_SpTeAttenRPCon.yaml ./ckpt/vessel_SpTeAttenRPCon_reproduce/epoch_049.pth.tar
  • The results (mAP at tIoUs) should be
Method 0.3 0.4 0.5 0.6 0.7 Avg
AFSD 73.87 56.75 35.40 14.53 2.93 36.70
TALLFormer 70.75 69.47 57.81 38.04 17.56 50.73
E2E-TAD 83.27 74.13 57.94 42.51 19.22 55.41
Actionformer 90.85 85.25 70.45 52.56 32.62 66.35
Ours 98.44 92.93 80.91 53.85 36.10 72.45
  • Precision (P), recall (R), F-score (F), average deviation (AD) and confidence interval (CI) will also be reported.
Method P R F AD CI
Actionformer 0.8096 0.8356 0.8013 5.46 0-6.13
Ours 0.8342 0.8612 0.8296 4.71 0-5.30

Contact

Binjie Qin (bjqin@sjtu.edu.cn)

Ruipeng Zhang (juipengchang@sjtu.edu.cn)

References

The authors thank all cited authors for providing the source code used in this work, especially Actionformer and ConvLSTM_pytorch.

If you are using our code, please consider citing our paper.

@article{Zhang2023LocatingXC,
  title={Locating X-ray coronary angiogram keyframes via long short-term spatiotemporal attention with image-to-patch contrastive learning.},
  author={Ruipeng Zhang and Binjie Qin and Jun Zhao and Yueqi Zhu and Yisong Lv and Song Ding},
  journal={IEEE transactions on medical imaging},
  year={2023},
  volume={PP}
}

If you are using the structure of this repository, you can also cite

@inproceedings{zhang2022actionformer,
  title={ActionFormer: Localizing Moments of Actions with Transformers},
  author={Zhang, Chen-Lin and Wu, Jianxin and Li, Yin},
  booktitle={European Conference on Computer Vision},
  series={LNCS},
  volume={13664},
  pages={492-510},
  year={2022}
}

If you are using vessel features extracted by SVS-Net, please cite

@article{Hao2020SequentialVS,
  title={Sequential vessel segmentation via deep channel attention network},
  author={Dongdong Hao and Song Ding and Linwei Qiu and Yisong Lv and Baowei Fei and Yueqi Zhu and Binjie Qin},
  journal={Neural networks : the official journal of the International Neural Network Society},
  year={2020},
  volume={128},
  pages={172-187}
}

About

Locating X-ray coronary angiogram keyframes via long short-term spatiotemporal attention with image-to-patch contrastive learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.5%
  • C++ 2.1%
  • Shell 0.4%