Skip to content

scalar-dev/trawler

Repository files navigation

Trawler

Trawler is an open source data catalogue for mapping and monitoring your data and systems.

NOTE: Trawler is currently being rewritten from scratch with some new *architecture ideas and is thus not suitable for production deployment. The new version will support ingesting data via datahub's tooling. This will allow us to focus on the data model and user experience and avoid having to maintain a large number of connectors (at least until the project has matured).

Getting started

The easiest way to get started with trawler locally is to run our docker-compose file:

curl https://raw.githubusercontent.com/scalar-dev/trawler/master/docker-compose.example.yml -o docker-compose.yml
docker-compose up

You can use the acryl-datahub CLI to ingest metadata into trawler.

# Point the CLI at the local trawler metadata service
DATAHUB_GMS_URL = "http://localhost:8081/api/datahub/main"

# Run a recipe
datahub ingest -c recipe.yml

Trawler serves an embedded UI at http://localhost:8081/ui.

To read more, see the datahub docs or check out one of the examples in datahub/ in this repository.

Goals

Trawler is intended to be different to other data catalog products:

  • Easy to deploy. A basic but fully-functional deployment requires only a single backend service and a PostgreSQL database. For additional features and scalability, additional services may be needed but will always be optional. Getting started with a powerful data catalog should be possible for every team, small or large.

  • Federated. Trawler will be the first data catalog to support federation via the ActivityPub protocol. This will allow individual teams to run their own trawler instances (should they wish) and to link their knowledge graphs together or track changes in upstream data sources. Granting access to outside users or organisations will be easy and secure.

  • Social. Capturing institutional knowledge is critical to maintaining a useful data catalogue. We will allow users to track documentation and communication related to data assets alongside Trawler's core machine-generated metadata.

  • Flexible. Existing products mostly have a fairly fixed set of entities and properties which can be recorded within the data catalog. Where they support extension, it can be painful. We intend to support a fully extensible metadata model with a well-typed schema.

  • Compatible and extensible. Trawler should be easy to integrate with legacy or bespoke systems. We support popular existing tools for capturing and collating metadata by implementing the datahub REST API.

  • Standards compliant. We will support existing semantic web formats for exchanging information about data assets: dcat and prov via JSON-LD.

Building trawler

If you're a fan of nix, you can run nix develop to get a shell configured with the dependencies needed to build trawler.

To build and run the backend, go the metadata/ and run

go generate .
go run cmd/server

To run the frontend, go the ui/ directory and run

npm install --include=dev
npm run dev

Scalar

Trawler is proudly developed and sponsored by Scalar a consultancy specialising in novel data engineering solutions.

About

Trawler is an open source data catalog with a powerful knowledge-graph data model

Resources

License

Stars

Watchers

Forks

Packages

No packages published