This repository contains a complete end-to-end system for real-time credit card fraud detection, including data analysis notebooks, a machine learning API, and infrastructure-as-code for cloud deployment. The system is designed with a modern, observable, and scalable architecture.
- Scalable Real-Time Credit Card Fraud Detection System
Watch a brief overview and demo of the system in action:
The repository is organized into distinct directories, each serving a specific purpose.
.
├── app/ # Contains the FastAPI application source code.
├── client/ # A Python client to simulate requests to the API.
├── deployments/ # Infrastructure (Terraform) and application (Argo CD, Helm) manifests for GitOps.
├── notebooks/ # Jupyter notebooks for data exploration, model training, and experimentation.
├── tests/ # Unit and integration tests for the application.
└── pyproject.toml # Project metadata and dependency management.
The system is designed to run in two primary environments: locally via Docker Compose for development and on Google Kubernetes Engine (GKE) for production, managed via a GitOps workflow.
- Fraud Detection API: A FastAPI server that exposes a prediction endpoint. It is instrumented with OpenTelemetry for collecting metrics, logs, and traces.
- Client Simulator: A Python script that continuously sends transaction data to the API to simulate real-world traffic.
- Observability Stack:
- Alloy: The collector agent that gathers telemetry data from the API and host environment.
- Loki: The backend for log aggregation and storage.
- Tempo: The backend for distributed trace storage.
- Prometheus: The backend for metrics storage and alerting.
- Grafana: The unified dashboard for visualizing all logs, metrics, and traces.
- Traefik: The Ingress Controller managing external access to services.
- Cert-Manager: Provides automatic TLS certificate provisioning.
This project can be run locally for development or deployed to a cloud environment.
The notebooks
directory contains all the research and analysis for this project, covering everything from data exploration to model training and evaluation.
To run the Jupyter Lab environment, see the instructions in the notebooks folder.
For a quick and easy local setup, use the provided Docker Compose configuration. This will spin up the API, the client simulator, and the entire observability stack on your local machine.
To run the local Docker environment, simply run the following command in the deployments/docker-compose folder:
cd deployments/docker-compose
docker compose up --build -d
For those looking to replicate a full cloud-native environment on-premise, this project can be deployed on a Kubernetes cluster running on Proxmox VE. This setup offers a powerful local alternative to GKE for development and testing.
For a complete guide on setting up the Kubernetes cluster, see the kubernetes-proxmox repository.
For a production-grade setup, you can provision the infrastructure on Google Kubernetes Engine (GKE) using Terraform and manage all applications via a GitOps workflow with Argo CD.
For a complete, end-to-end guide, see the instructions in the deployments folder.
This repository uses GitHub Actions to automate code quality checks and testing. The workflow, defined here, runs on every push and pull request to ensure the codebase remains clean and functional.
The CI pipeline includes the following stages:
- Linting: Code is linted using
ruff
to enforce style consistency and catch common errors. - Testing: Unit tests for the FastAPI application are executed using Pytest. This ensures that the core API logic remains correct and reliable. The tests can be found in the
tests/
directory.
This project automates its release process using a Continuous Deployment pipeline powered by GitHub Actions, as defined here. This workflow prepares new versions of the application for deployment in the GitOps-managed environment.
The CD pipeline is triggered automatically when a new version tag (e.g., v1.2.3
) is changed in pyproject.toml
file and pushed to the repository. It then performs the following steps:
- Build & Push Images: It builds multi-platform Docker images for the
api
services. - Push to Registry: The newly created images are pushed to the Github Artifact Registry (GHCR).
- Update Manifests: The workflow checks out the repository, updates the Helm chart's
values.yaml
file with the new image tag, and commits this change back to the repository.
This final commit triggers Argo CD to automatically detect the change in the Git repository and deploy the new version of the application to the Kubernetes cluster, completing the GitOps cycle.