Skip to content

Cutting-edge deepfake detection service, based on REST Protocol, designed to identify digital media, manipulated with use of Generative and Adversarial Networks.

Notifications You must be signed in to change notification settings


Repository files navigation

Deepfake Stream

DeepfakeStream is a web service for recognizing and spotting deepfakes on images and videos, artificially applied by Generative and Adversarial networks.

Table of Contents

  • Scope of the document
  • Introduction
  • Problem background
  • Proposed solution
  • Implementation plan
  • Testing and Validation
  • Deployment and Maintainance
  • Considerations and Caveats
  • Conclusion
  • Paper references


Deepfake techniques been rapidly evolving nowadays. With the arise of GAN-based networks, the number of video face swapping methods have dramatically increased. This contributes to a subset of common problems, such as blackmailing people, generating fake or misleading news and many more.

To fight back against this emerging threat, we want to propose Deepfake classification pipeline, which is capable of identifying high-quality deepfakes, where swapping of human faces is tightly involved.


The goal of the project, is to correctly identify face swappings, also known as "deepfakes", presented on the video and generally minimize the number of false positives / negatives.

The input data typically passed as a video file (.mp4) with presence of (deepfaked / or not) human faces. The video then is splitted into independent segments, which serves as an input to face detector to extract human faces. After face extraction, it is passed through the CNN-based network, which identifies noise features by leveraging the concept of SRM filters, the output probability for the video is the average of probs across all segments.


We want to find solution, which adapts well to the Deepfake Detection Challenge [DFDC] video dataset, while demanding tenable amount of FLOPs and power to be inferenced in production environment realities.

Proposed Solution

  • Architecture overview:

    The system comprises of a single REST application, deployed using containerization strategy called Docker, additionally, supports image builds via CI pipeline and deployment via corresponding CD pipeline, based on Github Actions technology.

  • Data ingestion and processing:

    Before passing data right to the classifier, we crop out human faces to reduce number of unnecessary spatial features, then apply set of predefined augmentations. For more details, you can visit document, which

  • Model Inference:

    inference supports several options, which can be defined manually using configuration file. Currently, you can select from running network on CPU and one or multiple GPUs, other backends are not supported yet.

  • Model Design:

    Model consists of following stages:

    • MTCNN
    • SRM Filter Convolution
    • EfficientNet-based Classifier
    • Custom CNN-based network

    We first pass data through MTCNN, to extract ROI (Regions of Interests), in our case, ROIs are represented as 'human faces'. Then, we compute noise features for each extracted face using concept of SRM Filters. After noise extraction, these maps are passed to EfficientNet-based classifier, which performs the main portion of noise analysis. Lastly, we pass the output of classifier to Custom CNN-based network, which computes the final probability of the face being deepfaked.

  • Data Collection:

    As a main source of data, we used around 200G of real and deepfake videos, gathered from Deepfake Detection Challenge (DFDC) Dataset, collected by Meta. It contains 100K of deepfaked video clips with 1-2 random people per each, and 19K of real human video clips.

  • Evaluation Metrics:

    As an evaluation metric, we selected F1-Score. It provides the way of balancing between precision and recall. For deepfake detection problem, both precisio and recall are important. High precision indicates, that model accurately identifies positive examples, while retaining minimum number of false positives. High Recall indicates a high portion of accurately identified positive examples out of the entire distribution of positively marked samples. Therefore, F1-Score is highly valuable, as it ensures, that both objectives are met.

Technologies and Languages

  • Python (3.11.6) - prog. language for project dvlpmnt
  • Torch (2.2.0) - DL Framework for Computer Vision
  • Shell (3.2.57) - for writing efficient executable scripts
  • Docker (20.10.12) - deployment strategy system
  • Kubernetes - (1.27.10) - container orchestration system


Currently, the system can only accept images as an input, however in further patches we will add support for video and realtime stream processsing.

Deepfake Paper References

Other references


Here you can find additional documentation to gain more clearity on how project works under the hood.


Cutting-edge deepfake detection service, based on REST Protocol, designed to identify digital media, manipulated with use of Generative and Adversarial Networks.






No packages published