Skip to content

CLI data documentation tool & catalog, built using fzf and amundsen-databuilder.

License

Notifications You must be signed in to change notification settings

FOGSEC/metaframe

 
 

Repository files navigation

Metaframe

Disclaimer: This project is still in alpha, so there will be bugs. Use at your own risk! But if you find bugs or have feature requests, open an issue :)

metaframe is a CLI data documentation tool (+catalog). It leverages junegunn/fzf and lyft/amundsen to create a blazingly fast CLI framework to:

  • Easily document your tables, using an organizational structure where tables are first-class citizens.
  • Run ETL jobs from the command-line (or manually document your datasets).
  • Search through your tables.

Installation

Mac OS

brew install rsyi/tap/metaframe

All others

If not on macOS, clone this directory, then run the following in the base directory of the repo (make sure ./dist does not exist, or pyinstaller won't rebuild):

make && make install

If there are errors, it's often because the specific flavor of python referenced by pip3 on your machine is incompatible (metaframe is tested against python 3.7 and 3.8 only). To troubleshoot this, try using a virtual environment in 3.7 or 3.8 or modifying the makefile pip3 reference to specific binary paths in your filesystem. Or open an issue!

We don't explicitly add an alias for the mf binary, so you'll want to either add ~/.metaframe/bin/ to your PATH, or add the following alias to your .bash_profile or .zshrc file.

alias mf=~/.metaframe/bin/mf

Getting started

Initialize file structure

Start by running:

mf init

which will generate a file structure in ~/.metaframe.

If you want to manually document tables, create a new table stub by running:

mf new <TABLE_NAME>

Then run mf to search over these docs! See the Manual usage section for more information.

If you want to run ETL jobs to automatically populate this metadata, keep reading.

Configure warehouse connections

If you want metadata to be scraped and populated automatically, you'll next need to add an entry to your connections.yaml file, which can be accessed by running mf connections edit. For example:

- name: presto                # optional
  type: presto
  host: host.mysite.com:8889
  username:                   # optional
  password:                   # optional
  cluster: system             # optional

The only necessary arguments are the host and the type. See Connection setup for more details (including information on type-specific syntax).

Run your ETL job

Once this configuration is complete, you can run your ETL job by running:

mf etl

By default this only pulls tables that haven't already been pulled. For more details, see ETL.

Go go go!

Run:

mf

to search over all metadata. Hitting enter will open the editable part of the docs in your default text editor, defined by the environmental variable $EDITOR.

About

CLI data documentation tool & catalog, built using fzf and amundsen-databuilder.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 85.7%
  • Shell 13.1%
  • Makefile 1.2%