Copulas

An open source project from Data to AI Lab at MIT.

Copulas

License: MIT
Development Status: Pre-Alpha
Documentation: https://sdv-dev.github.io/Copulas
Homepage: https://github.com/sdv-dev/Copulas

Overview

Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table containing numerical data, we can use Copulas to learn the distribution and later on generate new synthetic rows following the same statistical properties.

Some of the features provided by this library include:

A variety of distributions for modeling univariate data.
Multiple Archimedean copulas for modeling bivariate data.
Gaussian and Vine copulas for modeling multivariate data.
Automatic selection of univariate distributions and bivariate copulas.

Supported Distributions

Univariate

Gaussian
Student T
Beta
Gamma
Gaussian KDE
Truncated Gaussian

Archimedean Copulas (Bivariate)

Clayton
Frank
Gumbel

Multivariate

Gaussian
D-Vine
C-Vine
R-Vine

Install

Requirements

Copulas has been developed and tested on Python 3.5, 3.6 and 3.7

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where Copulas is run.

Install with pip

The easiest and recommended way to install Copulas is using pip:

pip install copulas

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

Install with conda

Copulas can also be installed using conda:

conda install -c sdv-dev copulas

This will pull and install the latest stable release from Anaconda.

Quickstart

In this short quickstart, we show how to model a multivariate dataset and then generate synthetic data that resembles it.

import warnings
warnings.filterwarnings('ignore')

from copulas.datasets import sample_trivariate_xyz
from copulas.multivariate import GaussianMultivariate
from copulas.visualization import compare_3d

# Load a dataset with 3 columns that are not independent
real_data = sample_trivariate_xyz()

# Fit a gaussian copula to the data
copula = GaussianMultivariate()
copula.fit(real_data)

# Sample synthetic data
synthetic_data = copula.sample(len(real_data))

# Plot the real and the synthetic data to compare
compare_3d(real_data, synthetic_data)

The output will be a figure with two plots, showing what both the real and the synthetic data that you just generated look like:

What's next?

For more details about Copulas and all its possibilities and features, please check the documentation site.

There you can learn more about how to contribute to Copulas in order to help us developing new features or cool ideas.

Credits

Copulas is an open source project from the Data to AI Lab at MIT which has been built and maintained over the years by the following team:

Manuel Alvarez manuel@pythiac.com
Carles Sala carles@pythiac.com
José David Pérez jose@pythiac.com
(Alicia)Yi Sun yis@mit.edu
Andrew Montanez amontane@mit.edu
Kalyan Veeramachaneni kalyan@csail.mit.edu
paulolimac paulolimac@gmail.com
Kevin Alex Zhang kevz@mit.edu
Gabriele Bonomi gbonomib@gmail.com

Related Projects

SDV

SDV, for Synthetic Data Vault, is the end-user library for synthesizing data in development under the HDI Project. SDV allows you to easily model and sample relational datasets using Copulas thought a simple API. Other features include anonymization of Personal Identifiable Information (PII) and preserving relational integrity on sampled records.

CTGAN

CTGAN is a GAN based model for synthesizing tabular data. It's also developed by the MIT's Data to AI Lab and is under active development.

Name		Name	Last commit message	Last commit date
Latest commit History 664 Commits
.github		.github
conda		conda
copulas		copulas
data		data
docs		docs
tests		tests
tutorials		tutorials
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
RELEASE.md		RELEASE.md
frog.jpg		frog.jpg
main.tex		main.tex
sample.bib		sample.bib
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

fealho/Copulas

Folders and files

Latest commit

History

Repository files navigation

Copulas

Overview

Supported Distributions

Univariate

Archimedean Copulas (Bivariate)

Multivariate

Install

Requirements

Install with pip

Install with conda

Quickstart

What's next?

Credits

Related Projects

SDV

CTGAN

About

Resources

License

Stars

Watchers

Forks

Languages