Skip to content

VectorInstitute/cycquery

Repository files navigation

cycquery

PyPI PyPI - Python Version code checks integration tests docs codecov license

cycquery is a tool for querying relational databases using a simple Python API. It is specifically developed to query Electronic Health Record (EHR) databases. The tool is a wrapper around SQLAlchemy and can be used to write SQL-like queries in Python, including joins, conditions, groupby aggregation and many more.

🐣 Getting Started

Installing cycquery using pip

python3 -m pip install cycquery

Query postgresql database

from cycquery import DatasetQuerier
import cycquery.ops as qo


querier = DatasetQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="dbname",
    user="usename",
    password="password",
)
# List all tables.
querier.list_tables()

# Get some table.
table = querier.schema.sometable()
# Filter based on some condition (e.g. substring match).
table = table.ops(qo.ConditionSubstring("col1", "substr"))
# Run query to get data as a pandas dataframe.
df = table.run()

# Create a sequential list of operations to perform on the query.
ops = qo.Sequential(
	qo.ConditionIn("col2", [1, 2]),
	qo.DropNulls("col3"),
	qo.Distinct("col1")
)
table = table.ops(ops)
# Run query to get data as a pandas dataframe.
df = table.run()

🧑🏿‍💻 Developing

Using poetry

The development environment can be set up using poetry. Hence, make sure it is installed and then run:

python3 -m poetry install
source $(poetry env info --path)/bin/activate

In order to install dependencies for testing (codestyle, unit tests, integration tests), run:

python3 -m poetry install --with test

API documentation is built using Sphinx and can be locally built by:

python3 -m poetry install --with docs
cd docs
make html SPHINXOPTS="-D nbsphinx_allow_errors=True"

Contributing

Contributing to cycquery is welcomed. See Contributing for guidelines.