New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make gapless tokens possible on JDBC/JPA databases #2747
Comments
Welcome, @hjohn, and thanks for providing an enhancement suggestion with us.
Although I wouldn't call it the reason, it's definitely why the usage of Axon Server ensures there cannot be any gaps. Regardless, ensuring something similar for JPA/JDBC thus alleviates the concern even further. However, looking at your scenario, I'd be wondering about three things:
|
I can imagine so, I saw quite a few questions related to gaps. I however used the wrong term initially, I didn't mean to create a gapless index, but only a monotonically increasing index. Gaps are not the the problem; gaps getting filled "later" is the problem. I'll use the field
It looks like database servers can't really do this automatically (it would have to be a trigger that runs in a separate transaction, which only some databases support), so the only option I think would be to have a background thread arrange this. The implementation could be a thread which gets signaled when new events are inserted, with a cooldown period and deadline for when it triggers the
The above
Yes, they have to be as the
Yes, I used the wrong terminology; the index doesn't need to be gapless, it only needs to be monotonically increasing. The reason gaps must be tracked is that they might fill up later (become visible later) when a slower transaction commits. However, with the above setup gaps can exist, but will never be filled later. |
I did some digging (read: asked the repository owner) whether they had ever considered doing this in the past. However, there's a caveat he added that's worth sharing. |
Yes, you are correct, apparently not all databases allow multiple I did think of a more complicated alternative, and I was hesitant to suggest it initially. Instead of having an extra column
You'd create two sequences:
The The update query would be slightly different:
Now, the advantages of this approach are:
But the obvious disadvantage is the additional complexity involved. |
Given the required (and desired) uniqueness of the indices, I think we can conclude the the initial suggestion is not generic enough a solution to provide for Axon Framework. Concerning the new proposition:
Although true, this merits investigation. As such, you're welcome to construct a working sample of the behavior, if you feel the added complexity outways the current existence of gaps (that we aim to abstract away from the user as much as possible). |
If you ever decide to want to implement this, I've tested this now (Postgres specific) and it works as expected. The update query needs to be careful to call I did some preliminary performance measurements, and I see a degradation of about 5-10% when compared to a version that does not perform these updates. The update query would be triggered immediately (in a separate transaction) after events were appended without any delays. Under high loads (>10000 I believe that tracking gaps is not completely free either, so it's possible the performance degradation of doing the above is offset by having to track the gaps. I've written the above for Postgres, and did not investigate if this could be provided in a database neutral way (Postgres specific parts include the advisory lock, and the |
That's some amazing effort you've put in, @hjohn. This test you've run, is that perhaps a shareable repository for us/others to view?
Of course, I agree with you here, @hjohn. |
I will get back to you on that, I can probably provide a gist. |
Hey @hjohn, just checking in to see whether you've had time to provide a gist for us and the team. |
The tests I did were not part of Axon, so I don't have anything to show that directly integrates with it. However, we are running it in production now (it's not Axon based, but as I did some Axon work before, I thought you'd be interested in this solution). I also can't share the code directly, as even though I wrote it, it's not mine to share. However, I can provide you with sufficient details if you want to integrate this into Axon. The solution is more database vendor specific than how Axon currently operates, so in order to support more databases the solution would need to be tailored to fit those. I've only done this for Postgres. Summary
Inserting events with temporary id'sIn the event table define a column that generates negative id's like this:
The same column can be re-used for the permanent id's (you can also choose to do this with 2 columns). Using negative values for temporary id's and positive ones for permanent id's saves having to have a separate column. Just be careful with queries that look for the highest id; if there are no permanent id's yet, it may return a negative value. Converting to permanent id's (Finalization)Define another sequence that is used for the permanent id's. Since it is never accessed concurrently, and only used for already committed data, it will be monotonically increasing and gapless.
The sequence uses Then define a function on your database that will update the temporary id's to permanent ones. The function preferably should return the result of its action so you can take appropriate action after it completes. The function defined below will return a tuple containing the number of rows it updated, and the highest permanent index currently in the events table (basically the current value of the It's recommended to do all of this in a single function to avoid paying latency costs for each statement:
This function should be called (in a separate transaction) whenever events were appended. When events are appended slowly, the function will probably run for each event appended, and update a single index. This has some overhead, but if volume is low, there is plenty of time available anyway. When volume is high, this function will still be triggered immediately, but as it won't complete instantly many more events may have been appended while it was running. The next call therefore will finalize many events to permanent indices in one go. This keeps the overhead low even in high volume scenario's. To ensure the function is always triggered, I'd recommend using a semaphore that is released each time an event is appended (run it after an event, or batch of events, was successfully committed):
And a background thread that loops and triggers each time the semaphore becomes available:
If you have any further questions, or need some help when you decide to implement this, I'll be happy to help. Multiple JVM's appending eventsThis is not really related to the finalizing of indices, but I thought I'd mention it anyway. When there are multiple JVM's appending events, which also consume their own events, they may not be aware immediately that new events were appended by another JVM. You'd need to poll the database (ugly), using something like In the solution we created we opted for the last option, using Hazelcast to share the latest known index between the various consumers (we also built this for Axon). This avoids any polling or database specific mechanisms, and is very low latency (milliseconds). |
Thanks for the detailed explanation, @hjohn. Concerning your approach, the fact database vendor-specific logic needs to be implemented makes it slightly problematic for Axon Framework. Nonetheless, as stated before, I still very much value the insights! |
We are, but not in that particular part, it is quite a varied landscape. Main reason we're not using it for that particular part is that these are more external events (always containing a complete aggregate) that will be consumed by all kinds of services, not all of which use Axon, not built by us, or are even in Java.
The solution can be made more vendor neutral I think, this one is just optimized as far as I could get it (for Postgres), avoiding unnecessary round trips. Initially this was built without using a stored procedure, and just a collection of SQL statements. I think offering a neutral solution, and perhaps a few optimized ones (which could be contributed) could still work. In the end, we're just looking to renumber a few negative id's into positive ones without gaps.
Yeah, let's see how high it will be on the priority list. I may be able to help out when the time comes. |
Enhancement Description
I'm evaluating Axon at the moment, and the
GapAwareTrackingToken
doesn't quite feel right to me.Current Behaviour
I understand that sequences in most databases can contain permanent and temporary gaps due to the nature of transactions and sequence assignments making it necessary to either somehow create a gapless sequence (or at least one that never fills gaps) or to track the gaps, involving max gaps, clean up of gaps, max gap size type trade offs.
Wanted Behaviour
I'd like to propose a possible alternative solution for this problem that makes a gapless global index possible.
gapless_index
in the domain events table that is initially filled withNULL
and should not be set on insertsNULL
s in this column and fill these values from a 2nd sequence, ordered by theglobal_index
column. This background process should take out a global lock and should run in a separate transaction. It's irrelevant who does this update, or how often.gapless_index
in tokensThe trade offs here seem to be:
I think
gapless_index
will be safe to use as Axon JDBC/JPA stores already guarantee the invariant that the global index increases monotonically for events ordered by [type, aggregate_identifier, sequence_number]The text was updated successfully, but these errors were encountered: