Skip to content

bluerogue251/DBSubsetter

Repository files navigation

DBSubsetter

DBSubsetter is a tool for taking a logically consistent subset of a relational database.

Starting with a given set of rows, it respects foreign key constraints by recursively fetching the parents and (optionally) children of those rows. This is useful for local development and testing, or for exporting data from a particular group of users.

Project Goals

Easy to learn: A simple and well documented command line interface.

Support large datasets: Designed for stability when handling large datasets.

Deterministic: Identical inputs yield identical outputs.

Do one thing well: A tiny codebase focused exclusively on core subsetting features.

Usage Instructions

  1. Load an empty schema from your origin database into your target database, following vendor-specific instructions for Postgres, MySQL, or SQL Server.

  2. Use Java 8 or above to run our latest release:

# Download the DBSubsetter.jar file
$ wget \
    --quiet \
    --show-progress \
    --output-document DBSubsetter.jar \
    https://github.com/bluerogue251/DBSubsetter/releases/download/v1.0.0-beta.7/DBSubsetter.jar
 
# Learn how to use DBSubsetter
$ java -jar DBSubsetter.jar --help | less

# Run DBSubsetter
$ java -jar DBSubsetter.jar \
    --schemas schema_1,schema_2 \
    --originDbConnStr "jdbc:<driverName>://<originConnectionString>" \
    --targetDbConnStr "jdbc:<driverName>://<targetConnectionString>" \
    --baseQuery "your_schema.users ::: id % 100 = 0 ::: includeChildren" \
    --keyCalculationDbConnectionCount 8 \
    --dataCopyDbConnectionCount 8
  1. After DBSubsetter exits, follow vendor-specific instructions for: Postgres, MySQL, or SQL Server.

Contributing

Contributions of all kinds are welcome! To ask a question, report a bug, or request a feature, please open an issue. To contribute code changes, please open a pull request. Please follow our code of conduct when contributing.

Related Projects

DBSubsetter was inspired by Jailer and rdbms-subsetter. Other related resources include sqlsizer, db_subsetter, DataBee, pg_sample, DATPROF, abridger, postgres-subset, and CA Data Subset.

DBSubsetter is written in Scala using Chronicle-Queue and scopt. Slick is used for testing.

License

DBSubsetter is released under the MIT License.