Built-in data warehouse #3011
codyebberson
started this conversation in
Ideas
Replies: 1 comment
-
A few recent thoughts:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Many self-hosting Medplum customers have an existing data warehouse solution such as Snowflake or Databricks. We work with enterprise customers to setup HTTP webhooks or Postgres data dumps to ETL the data.
On the other hand, many Medplum customers do not have this, for a variety of reasons.
I'm starting this discussion to explore options for what we could do in our CDK infra-as-code to take advantage of AWS services such as AWS Redshift or AWS Athena.
Use Case
Short term
There are 2 resource types in Medplum which can accumulate quickly and age-out quickly:
Login
andAuditEvent
.Login
is a custom Medplum resource that represents a user authentication event, and effectively acts as a "user session". When patients login via Medplum Auth, a self-hosted server can quickly accumulate manyLogin
resources. Once aLogin
has expired, it does not provide any operational value. You may still want them for analytics or customer support purposes.AuditEvent
is a standard FHIR resource which has many purposes. Developers can create their ownAuditEvent
resources for any purpose. Medplum server automatically createsAuditEvent
resources on bot invocations and webhook calls. If your server heavily uses bots or subscriptions, you will accumulate manyAuditEvent
resources. The operational value of anAuditEvent
rapidly declines after 30-90 days. However, your organization may require longer data retention for legal or compliance purposes.For both
Login
andAuditEvent
, Medplum provides built-in tools to purge resources older than a specified time period. That feature should really only be used if those resources are stored in a data warehouse.Long term
If Medplum server had excellent support for seamlessly using Postgres and a data warehouse in harmony, it would open up many interesting possibilities.
For example, some organizations heavily rely on
Task
resources for workflow management. In general, you always want thoseTask
resources to be available (imagine looking at a patient profile when they come back for an appointment 2 years after their last).Proposal
P1
P2
Login
andAuditEvent
resources by configurable criteriaP3
Repository
class which allows limited FHIR read/search functionality from data warehouse as a data sourceAuditEvent
???P4
Redshift vs Athena
The two obvious solutions for a built-in data warehouse would be AWS Redshift and AWS Athena.
There are many articles that compare these services:
Main differences:
Beta Was this translation helpful? Give feedback.
All reactions