Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table partitioning for event table #2979

Open
williancolognesitrimble opened this issue Feb 14, 2024 · 4 comments
Open

Table partitioning for event table #2979

williancolognesitrimble opened this issue Feb 14, 2024 · 4 comments
Labels
Ideal for Contribution Priority 4: Would Lowest priority. Would-be-nice to include issues when time allows it. Type: Feature Use to signal an issue is completely new to the project.

Comments

@williancolognesitrimble

Enhancement Description

We understand that the number of events will continue to grow indefinitely, and sometimes there's a need to retain these events for extended periods. However, as time progresses, querying this database using JDBC event store may lead to degraded performance. To address this challenge, we can leverage table partitioning.

By utilizing table partitioning, we can optimize performance, even with a historical database. This involves partitioning tables based on the datetime of events, allowing for partitioning by month, week, or even day, depending on the specific use case. Consequently, queries, such as those executed in JdbcEventStorageEngineStatements, should be modified to utilize datetime together aggregateIdentifier.

Current Behaviour

Currently it's only possible to optimize this table by using index or using partitioning by aggregateIdentifier that would not be enough depending on the size of your table.

Wanted Behaviour

Includes event datetime in all queries that are executed in jdbc to be able to define which partition table the database should scan.

Possible Workarounds

There are no workarounds without implementing this AFAIK.

@williancolognesitrimble williancolognesitrimble added the Type: Enhancement Use to signal an issue enhances an already existing feature of the project. label Feb 14, 2024
@smcvb smcvb added Priority 4: Would Lowest priority. Would-be-nice to include issues when time allows it. Type: Feature Use to signal an issue is completely new to the project. Ideal for Contribution and removed Type: Enhancement Use to signal an issue enhances an already existing feature of the project. labels Feb 16, 2024
@smcvb
Copy link
Member

smcvb commented Feb 16, 2024

Hey @williancolognesitrimble, thanks for making this issue with us!

Concerning your suggestion, I am afraid it may take an approach that's only feasible for Event Streaming.
So, something that would be "doable" for tracking tokens and Event Processors, but not for the EventSourcingRepository.

When it's about Aggregates for Event Sourcing, you wouldn't know the date to jump back to.
You know you need to start from position zero or X if snapshots are in place.
Thus, a change as suggested in the statements, wouldn't suffice to support table partitioning.

Furthermore, you only suggest something for JDBC, leaving out JPA support entirely (the second RDBMS-based Event Store pillar in AF).
You could even state that, as I anticipate this to impact the EventSourcingRepository and EventStore APIs, that our Mongo Extension should see changes as well, let alone have the Axon Server Connector deal with these adjustments.

Lastly, I want to react to your "Possible Workarounds" sentence:

There are no workarounds without implementing this AFAIK.

Although building yourself is indeed an option, a workaround with a lot (and I really mean a lot) less work, is to use a purpose-built Event Store solution.
The predicament of the event count increasing indefinitely is exactly what Event Store implementations are meant for, and RDBMS' are not.


With all that said, I've changed the type from "Enhancement" to "Feature", as I would argue table partitioning support is a feature rather than an enhancement.
Furthermore, I have set the priority to 4, as the Axon Framework team cannot prioritize such a feature at this stage.
And, lastly, I have added the "Ideal for Contribution" label, to clarify a PR providing a generic workable solution may be a way forward.

I hope that clarifies our stance further, @williancolognesitrimble!

@williancolognesitrimble
Copy link
Author

Sounds good, agreed.

Just a few clarifications:
When it's about Aggregates for Event Sourcing, you wouldn't know the date to jump back to. You know you need to start from position zero or X if snapshots are in place.

Considering this, I'm not 100% familiar with axon structure, but I believe we could use tokenstore or something like that, to have one more property that stores the datetime of the latest event received for each event processor, then it would be used to query the aggregate. Of course, there are many work to do other than JDBC, I really made a summary for a feature that would be great to have using JDBC as example. It would need more research to know what's the whole work needed to be done.

Although building yourself is indeed an option, a workaround with a lot (and I really mean a lot) less work, is to use a purpose-built Event Store solution.
Nice to know, I'll read about how it's working, Thanks!

@smcvb
Copy link
Member

smcvb commented Feb 19, 2024

Considering this, I'm not 100% familiar with axon structure, but I believe we could use tokenstore or something like that, to have one more property that stores the datetime of the latest event received for each event processor, then it would be used to query the aggregate.

That makes sense @williancolognesitrimble! I'll guide a little further about the infrastructure components of Axon Framework.

When it comes to Event Streaming, which is arguably the "query side" of the application when talking in CQRS terms, you will hit the following infra components:

  1. The EventStore and, subsequently the EventStorageEngine
  2. The StreamingEventProcessor, with either the TrackingEventProcessor or PooledStreamingEventProcessor implementation.
  3. The TokenStore

It is the event streaming area I am worrying about the least, as you indeed have the TrackingToken as part of the TokenStore that's used to open a stream with the EventStore.
Hence, logic could be expanded in this area.

When it comes to EventSourcing, which from AF's perspective is the "command side" in CQRS terms, the following in infra components come into play:

  1. The EventStore and, subsequently the EventStorageEngine
  2. The Repository and especially the EventSourcingRepository implementation

Whenever an Aggregate is loaded from the events it has published, the query done to the EventStore is different from the one done for Event Streaming.
More specifically, the query isn't down with a TrackingToken, but just with an aggregate identifier.

To put it in context, the @TargetAggregateIdentifier annotated field within a command is used to query the EventSourcingRepository, which in turn will query the EventStore.

It's this pointer that complicates the suggestion, as the API used to query a stream of events for Event Sourcing has no notion of time.
Nor does it make a lot of sense from the API's perspective if I am being honest.|

Of course, there are many work to do other than JDBC, I really made a summary for a feature that would be great to have using JDBC as example. It would need more research to know what's the whole work needed to be done.

Sure thing! I just wanted to make sure the suggestion wasn't lost :-)

Nice to know, I'll read about how it's working, Thanks!

That's awesome, thanks!
We at AxonIQ have put in a ton of work to make it as efficient as possible when it comes to an Event Store.
By the way, If you'd have any questions concerning it, perhaps our forum would be a good place to check and ask for feedback.

@williancolognesitrimble
Copy link
Author

That's nice, thank you so much for the explanation @smcvb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ideal for Contribution Priority 4: Would Lowest priority. Would-be-nice to include issues when time allows it. Type: Feature Use to signal an issue is completely new to the project.
Projects
None yet
Development

No branches or pull requests

2 participants