SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation

S-Lab, Nanyang Technological University

[Project Page] | [Paper] | [Extension Paper] | [Dataset]

Updates

[02/2024] Dataset link has been updated with hugginface.
[09/2023] Arxiv extension paper released.
[07/2022] Pretrained models are uploaded.
[07/2022] Project page and dataset are released.
[07/2022] Code is released.

Introduction

This is the official implementation of Detecting and Recovering Sequential DeepFake Manipulation. We introduce a novel research problem: Detecting Sequential DeepFake Manipulation (Seq-DeepFake), which focus on detecting the sequences of multi-step facial manipulations. To faciliatate the study of Seq-Deepfake, we provide a large-scale Sequential Deepfake Dataset, and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer).

The framework of the proposed method:

Installation

Download

git clone https://github.com/rshao/SeqDeepFake.git
cd SeqDeepFake

Environment

We recommend using Anaconda to manage the python environment:

conda create -n seqdeepfake python=3.6
conda activate seqdeepfake
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit==10.1.243
conda install pandas
conda install tqdm
conda install pillow
pip install tensorboard==2.4.1

Dataset Preparation

A brief introduction

We contribute the first large-scale Sequential DeepFake Dataset, Seq-Deepfake, including ~85k sequentially manipulated face images, each annotated with its ground-truth manipulation sequence.

The images are generated based on the following two different facial manipulation methods, with 28 / 26 types of manipulation sequences (including original), repectively. The lengths of all manipulation sequences range from 1~5.

Sequential facial components manipulation (based on CelebAMask-HQ and StyleMapGAN)
Sequential facial attributes manipulation (based on FFHQ and Talk-To-Edit)

Here are some sample images and statistics:

Annotations

Each image in the dataset is annotated with a list of length 5, indicating the ground-truth manipulation sequence. The labels in the sequence are defined as follows:

For Sequential facial components manipulation:

0: 'NA', 1: 'nose', 2: 'eye', 3: 'eyebrow', 4: 'lip', 5: 'hair'

Note: 'NA' means no manipulation is taken in this step.

For Sequential facial attributes manipulation:

0: 'NA', 1: 'Bangs', 2: 'Eyeglasses', 3: 'Beard', 4: 'Smiling', 5: 'Young'

Note: 'NA' means no manipulation is taken in this step.

Note that label 0 serves as the placeholder for sequential manipulations shorter than 5 steps. For example, the annotation for manipulation sequence nose-eye-lip would be: [1, 2, 4, 0, 0]. Original images are annotated with [0, 0, 0, 0, 0].

Prepare data

You can download the Seq-Deepfake dataset through this link: [Dataset]

After unzip all sub files, the structure of the dataset should be as follows:

./
├── facial_attributes
│   ├── annotations
│   |   ├── train.csv
│   |   ├── test.csv
│   |   └── val.csv
│   └── images
│       ├── train
│       │   ├── Bangs-Eyeglasses-Smiling-Young
│       │   |   ├── xxxxxx.jpg
|       |   |   ...
|       |   |   └── xxxxxx.jpg
|       |   ...
│       │   ├── Young-Smiling-Eyeglasses
│       │   |   ├── xxxxxx.jpg
|       |   |   ...
|       |   |   └── xxxxxx.jpg
│       │   └── original
│       │       ├── xxxxxx.jpg
|       |       ...
|       |       └── xxxxxx.jpg
│       ├── test
│       │   % the same structure as in train
│       └── val
│           % the same structure as in train
└── facial_components
    ├── annotations
    |   ├── train.csv
    |   ├── test.csv
    |   └── val.csv
    └── images
        ├── train
        │   ├── eyebrow-eye-hair-nose-lip
        │   |   ├── xxxxxx.jpg
        |   |   ...
        |   |   └── xxxxxx.jpg
        |   ...
        │   ├── nose-eyebrow-lip-eye-hair
        │   |   ├── xxxxxx.jpg
        |   |   ...
        |   |   └── xxxxxx.jpg
        │   └── original
        │       ├── xxxxxx.jpg
        |       ...
        |       └── xxxxxx.jpg
        ├── test
        │   % the same structure as in train
        └── val
            % the same structure as in train

Training

Single-GPU

Modify train.sh and run:

sh train.sh

Please refer to the following instructions about some arguments:

Args	Description
CONFIG	Path of the network and optimization configuration file.
DATA_DIR	Directory to the downloaded dataset.
DATASET_NAME	Name of the selected manipulation type. Choose from 'facial_components' and 'facial_attributes'.
RESULTS_DIR	Directory to save logs and checkpoints.

You can change the network and optimization configurations by adding new configuration files under the directory ./configs/.

Multiple-GPUs (Slurm)

We also provide slurm script that supports multiple GPUs training:

sh train_slurm.sh

where PARTITION and NODE should be modified according to your own environment. The number of GPUs to be used can be set through the NUM_GPU argument.

Testing

Modify test.sh and run:

sh test.sh

For the arguments in test.sh, please refer to the training instructions above, plus the following ones:

Args	Description
TEST_TYPE	The evaluation metrics to use. Choose from 'fixed' and 'adaptive'.
LOG_NAME	Should be set according to the log_name of your trained checkpoint to be tested.

We also provide slurm script for testing:

sh test_slurm.sh

Benchmark Results

Here we list the performance of three SOTA deepfake detection methods and our method. Please refer to our paper for more details.

Facial Components Manipulation

Method	Reference	Fixed-Acc ${\uparrow}$	Adaptive-Acc ${\uparrow}$
DRN	Wang et al.	66.06	45.79
MA	Zhao et al.	71.31	52.94
Two-Stream	Luo et al.	71.92	53.89
SeqFakeFormer	Shao et al.	72.65	55.30

Facial Attributes Manipulation

Method	Reference	Fixed-Acc ${\uparrow}$	Adaptive-Acc ${\uparrow}$
DRN	Wang et al.	64.42	43.20
MA	Zhao et al.	67.58	47.48
Two-Stream	Luo et al.	66.77	46.38
SeqFakeFormer	Shao et al.	68.86	49.63

Pretrained Models

We also provide the pretrained models that generate our results in the benchmark table:

Model	Description
pretrained-r50-c	Trained on `facial_components` with `resnet50` backbone.
pretrained-r50-a	Trained on `facial_attributes` with `resnet50` backbone.

In order to try the pre-trained checkpoints, please:

download from the links in the table, unzip the file and put them under the ./results folder with the following structure:

results
└── resnet50
    ├── facial_attributes
    │   └── pretrained-r50-a
    │       └── snapshots
    │           ├── best_model_adaptive.pt
    │           └── best_model_fixed.pt
    └── facial_components
        └── pretrained-r50-c
            └── snapshots
                ├── best_model_adaptive.pt
                └── best_model_fixed.pt

In test.sh, modify DATA_DIR to the root of your Seq-DeepFake dataset. Modify LOGNAME and DATASET_NAME to 'pretrained-r50-c', 'facial_components' or 'pretrained-r50-a', 'facial_attributes', respectively.
Run test.sh.

Citation

If you find this work useful for your research, please kindly cite our paper:

@inproceedings{shao2022seqdeepfake,
  title={Detecting and Recovering Sequential DeepFake Manipulation},
  author={Shao, Rui and Wu, Tianxing and Liu, Ziwei},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
configs		configs
datasets		datasets
figs		figs
models		models
tools		tools
.gitignore		.gitignore
README.md		README.md
test.py		test.py
test.sh		test.sh
test_slurm.sh		test_slurm.sh
train.py		train.py
train.sh		train.sh
train_slurm.sh		train_slurm.sh

rshaojimmy/SeqDeepFake

Folders and files

Latest commit

History

Repository files navigation