GitHub - FraunhoferISST/diva: Scalable data management system with an AI powered profiling and metadata enrichment

DIVA - Data Inventory and Valuation Approach

An awesome data catalog application

Developed for evaluating the newest data management technologies in context of data transparency, data insight and data networking

Motivation

This is an ongoing project of the Digitization in Service Industries department of the Fraunhofer ISST. Data is getting more and more important to companies. By utilizing the right data, companies can get more productive and will be able to succeed their competitors. Thus, we believe it is time for a data management solution, that evaluates new innovative solutions to support companies in their daily work with data. This tool will grow day by day and we try our best to tackle data management challenges in companies.

We also use this tool as a playground for our students, where they can work out topics for their bachelor or master thesis. Even the PhD students profit from this tool as a platform for their doctoral thesis.

Features

🏛️ microservice architecture: allows to choose the best technology for solving a problem and a more easy scaling
💻 client application: an easy to use web application for managing all kinds of data management related topics
🖥️ portal application: simple search for interesting files on different devices (WIP)
🐳 docker ready: all microservices and core components are docker ready so you can start them right out of the box.

Core Technologies and Frameworks used

Technology	Description
Kong	our API gateway that we use to route microservices
Kafka	message log for microservice communication
node.js	nice JavaScript platform for running server apps
Express Framework	helps us building simple microservices
Docker	building and publishing images
Kubernetes	production-grade container orchestration
Airflow	author, schedule and monitor workflows
OpenAPI	specification language to describe the HTTP APIs of our microservices
AsyncAPI	specification language to describe how `Kafka` and `WebSocket` messages look
JSON Schema	specification language to describe how an entity is build
MongoDB	our main document store that is the single source of truth when it comes to metadata
Elasticsearch	our search index used to search for entities and make interesting aggregations
Keycloak	Open Source Identity and Access Management
MinIO	our object store to save files uploaded by browser (aka `diva-lake`)
neo4j	our graph database to store relations between entities more efficient

Other Technologies and Frameworks used

Technology	Description
VueJS 2	component based frontend solution for building robust apps
Vuetify	makes frontend beautiful
Apache Tika	if you need to take a look into heterogenous data, Tika is your solution
Python3	helps us doing data science and NLP (natural language processing)
Kibana	our window into `elasticsearch` for debugging
Filebeat	fills elasticsearch with logs produced in our microservices

Quick start

The complete system can be quickly bootstrapped with Docker:

cd docker
# create .env and copy contents from .env.default to it
cp .env.default .env
# execute the script to boot all necessary components
./up_core.sh

To better prepare for the production environment, some system settings must be tweaked. Follow our documentation to learn more about the configuration, concepts and the underlying architecture of DIVA!

Credits

This project is developed by employees of Fraunhofer ISST. They put all their ❤ into this project to try out the latest cutting edge technologies.

Active People

Daniel Tebernum (Lead)	Sergej Atamantschuk (Lead)	Anatoly Novohatny	Janis Büse

Dustin Chabrowski (Alumni)	Marcel Altendeitering (Alumni)	Julia Pampus (Alumni)

Name		Name	Last commit message	Last commit date
Latest commit History 1,541 Commits
.github		.github
core		core
docker		docker
faas		faas
migration		migration
web-client		web-client
.auto-changelog		.auto-changelog
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build-dist.sh		build-dist.sh
changelog-template.hbs		changelog-template.hbs

License

FraunhoferISST/diva

Folders and files

Latest commit

History

Repository files navigation

DIVA - Data Inventory and Valuation Approach

Table of Contents

Motivation

Features

Core Technologies and Frameworks used

Other Technologies and Frameworks used

Quick start

Credits

Active People

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages