Skip to content

philterd/entitydb

Repository files navigation

EntityDB

EntityDB is an application that integrates several components to provide a unified means for storing and querying entities (people, places, and things). This project includes the Entity Query Language (EQL) which facilitates querying entities across various underlying databases through a single query language.

Architecture

Entities are stored in an underlying database. Supported databases are MySQL, MongoDB, Cassandra, and DynamoDB. Entities are indexed in Elasticsearch for fast querying. A cache stores recently ingested and accessed entities to improve performance. A separate database, the data store, manages data such as users, groups, queries, and other information.

Features

The following are brief high-level descriptions of EntityDB's main features. Refer to the wiki for more detailed descriptions and information on how to configure and use the features.

API

The API is built on REST and JSON. The API allows for entity ingestion, status and health monitoring, and entity querying through the Entity Query Language (EQL).

Entity Store

The entity store is the master dataset of entities. It is an immutable data store. EntityDB provides a choice of MySQL, MongoDB, Cassandra, and DynamoDB for the underlying entity store. You are free to choose the database that best satisfies your use-case requirements.

Search Index

As entities are ingested they are indexed in a search engine. All queries are performed against the search engine. Currently, the only supported search index is Elasticsearch.

Entity Access Control

Each ingested entity is assigned an ACL. The ACL determines the entity's visibility to users and groups of the system.

Audit

Various actions that occur in EntityDB are outputted as audit events. Some of the audited events include entity ingests, entities returned through queries, and entity ACL modifications.

Continuous Queries

Entities received through the API are evaluated by the continuous queries. A continuous query is an EQL query that generates a notification when an entity meets the query's conditions. Continuous queries can be used to receive notifications that an ingested entity satisfies some conditions. Continuous queries are designed to be fast and efficient and promote a low time-to-alert (TTA).

For example, the continuous query select * from entities where text = 'George' will generate a notification when an entity having the text "George" is ingested.

Rules Engine

Similar to continuous queries, the rules engine is executed for each ingested entity. Rules are user-defined and can be created to take a specific action on entities that are found to match one or more conditions. However, unlike continuous queries, rules can contain complex logic and actions and are designed to be executed when time-to-alert is not critical.

Metric Reporting

EntityDB can report metrics to AWS CloudWatch, InfluxDB, or the console. These metrics report values such as how long an entity is in the ingest queue before being ingested, how long continuous queries are taking to evaluate, and the counts of stored and indexed entities. These metrics provide a comprehensive overview of EntityDB's performance and statistics.

Scalability

EntityDB is easily scaled horizontally since its components are all distributed. Simply stand up a new EntityDB instance to increase its throughput and performance. EntityDB's sample AWS CloudFormation creates an EntityDB autoscaling group behind an Elastic Load Balancer. The autoscaling group can be set to scale based on metrics such as the size of the ingest SQS queue, any of the EntityDB reported metrics, or any EC2 instance metrics.

Building EntityDB

During EntityDB's build tests will be run. Some of the unit tests are more like integration tests and this is an area for improvement.

mvn clean install

Running

Once successfully built, an entitydb.jar will be under entitydb-app/target. This is a runnable jar that can be started with java -jar entitydb.jar. By default, all components will use internal implementations but this can be changed in the entitydb.properties. See the Documentation for details on configuring the entitydb.properties.

Ingesting Entities

Via the REST API

Entities can be ingested through the API. Look under the scripts/ directory for sample cURL scripts. Entities must be in the format defined in entity-model. Ingested entities are immutable.

Via the Internal API

When integrated directly with your application entities can be ingested through the queues bypassing the REST API. It is not recommended to ingest without queuing entities in order to prevent entity loss due to capacity or network issues.

Copyright © 2024 Philterd, LLC.