Cross Modality Bug Detection

This repository is the official implementation of our paper: Cross-Modality Mutual Learning for Enhancing Smart Contract Vulnerability Detection on Bytecode.

Requirements

Requirements

To install requirements:

pip install -r requirements.txt

Required Packages

Run the following commands to install the required packages.

pip install torch==1.0.0
pip install numpy==1.15.4
pip install six==1.15.0
pip install scikit-learn==0.24.1
pip install scipy==1.1.0

Datasets

The dataset has been created by collecting smart contracts from three different sources: 1) Ethereum platform (more than 96%), 2) GitHub repositories, and 3) blog posts that analyze contracts.

More details for the dataset instruction can be found on our dataset page at Smart-Contract-Dataset, which is continuously improving.

SourceGraphExtractor

For source code, we use a code semantic graph to frame the control- and data- dependencies in the code. We extract two kinds of graph nodes and three types of temporal edges. We refer to the open-sourced tool of GraphExtractor.

BinaryCFGExtractor

For bytecode, we develop the automated tool (i.e., BinaryCFGExtractor) to extract the control flow graph in the code, which consists of bytecode blocks (i.e., nodes) and control flow edges. We released BinaryCFGExtractor on our tool page at BinaryCFGExtractor.

GraphFeatureExtractor

After obtaining the two kinds of graphs, we build upon the architecture of a graph neural network to learn the high-level graph semantic embeddings of both the source code and the bytecode. Specifically, we released the GraphFeatureExtractor on our tool page at GraphFeatureExtractor.

BertPretainFinetune

For processing the bytecode blocks in a CFG, we propose a pre-trained model BERT to handle it. First, we pre-train the BERT through an instruction-level task and a block-level task. Second, considering the discrepancy between different vulnerabilities, we enforce a dedicated fine-tuning of the pre-trained BERT on each of the four vulnerabilities. Finally, the features of bytecode blocks are extracted by the fine-tuned BERT.

More details for the pre-training and fine-tuning of the BERT can be found on our tool page at BertPretrainFinetune.

Usage

Training

To train a dual-modality teacher network (auxiliary network) and a single-modality student network (main network), we can run the following command:

python3 train_new.py --dataset timestamp --batch_size 32 --lr 1e-3 --wd 1e-4 --epoch 20 --shuffle True --cuda True

Note

If any question, please email to messi.qp711@gmail.com.

Reference

Zhuang, Yuan and Liu, Zhenguang and Qian, Peng, et al. Smart Contract Vulnerability Detection using Graph Neural Network. IJCAI, 2020. GNNSCVulDetector

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bytecode		bytecode
sourcecode		sourcecode
README.md		README.md
config.py		config.py
models_update.py		models_update.py
parser_tool.py		parser_tool.py
requirements.txt		requirements.txt
train_new.py		train_new.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytecode

bytecode

sourcecode

sourcecode

README.md

README.md

config.py

config.py

models_update.py

models_update.py

parser_tool.py

parser_tool.py

requirements.txt

requirements.txt

train_new.py

train_new.py

Repository files navigation

Cross Modality Bug Detection

Requirements

Usage

Note

Reference

About

Releases 1

Packages

Contributors 2

Languages

Messi-Q/Cross-Modality-Bug-Detection

Folders and files

Latest commit

History

Repository files navigation

Cross Modality Bug Detection

Requirements

Usage

Note

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages