Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make gapless tokens possible on JDBC/JPA databases #2747

Open
hjohn opened this issue Jun 13, 2023 · 12 comments
Open

Make gapless tokens possible on JDBC/JPA databases #2747

hjohn opened this issue Jun 13, 2023 · 12 comments
Assignees
Labels
Priority 2: Should High priority. Ideally, these issues are part of the release they’re assigned to. Type: Enhancement Use to signal an issue enhances an already existing feature of the project.

Comments

@hjohn
Copy link

hjohn commented Jun 13, 2023

Enhancement Description

I'm evaluating Axon at the moment, and the GapAwareTrackingToken doesn't quite feel right to me.

Current Behaviour

I understand that sequences in most databases can contain permanent and temporary gaps due to the nature of transactions and sequence assignments making it necessary to either somehow create a gapless sequence (or at least one that never fills gaps) or to track the gaps, involving max gaps, clean up of gaps, max gap size type trade offs.

Wanted Behaviour

I'd like to propose a possible alternative solution for this problem that makes a gapless global index possible.

  1. Add an additional column gapless_index in the domain events table that is initially filled with NULL and should not be set on inserts
  2. Have a background process (perhaps using LISTEN/NOTIFY, not a trigger) look for any NULLs in this column and fill these values from a 2nd sequence, ordered by the global_index column. This background process should take out a global lock and should run in a separate transaction. It's irrelevant who does this update, or how often.
  3. Use the gapless_index in tokens

The trade offs here seem to be:

  1. the need for another transaction per set of events stored, or at least one additional transaction within a given timeout (ie. 500 ms for example).
  2. the need for another column, index and sequence - especially the index can incur extra costs

I think gapless_index will be safe to use as Axon JDBC/JPA stores already guarantee the invariant that the global index increases monotonically for events ordered by [type, aggregate_identifier, sequence_number]

@hjohn hjohn added the Type: Enhancement Use to signal an issue enhances an already existing feature of the project. label Jun 13, 2023
@smcvb
Copy link
Member

smcvb commented Jun 14, 2023

Welcome, @hjohn, and thanks for providing an enhancement suggestion with us.

I'm evaluating Axon at the moment, and the GapAwareTrackingToken doesn't quite feel right to me.

Although I wouldn't call it the reason, it's definitely why the usage of Axon Server ensures there cannot be any gaps.
The times I've had conversations about these gaps, their purpose, and their pain is an ever-increasing BIG_INT on my yearly schedule.
So, we felt it necessary that AxonIQ's custom Event Store ensures these gaps cannot occur. Period.

Regardless, ensuring something similar for JPA/JDBC thus alleviates the concern even further.

However, looking at your scenario, I'd be wondering about three things:

  1. How do you see an implementation of this background process to insert the gapless_index field?
  2. Would you block Event Processors from reading events that have a null gapless_index`?
  3. Doesn't diverting to a second sequence for the gapless_index still provide the possibility the sequence generator is (accidentally) reused, thus creating gaps?

@smcvb smcvb assigned smcvb and hjohn Jun 14, 2023
@smcvb smcvb added the Status: Under Discussion Use to signal that the issue in question is being discussed. label Jun 14, 2023
@hjohn
Copy link
Author

hjohn commented Jun 15, 2023

Welcome, @hjohn, and thanks for providing an enhancement suggestion with us.

I'm evaluating Axon at the moment, and the GapAwareTrackingToken doesn't quite feel right to me.

Although I wouldn't call it the reason, it's definitely why the usage of Axon Server ensures there cannot be any gaps. The times I've had conversations about these gaps, their purpose, and their pain is an ever-increasing BIG_INT on my yearly schedule. So, we felt it necessary that AxonIQ's custom Event Store ensures these gaps cannot occur. Period.

I can imagine so, I saw quite a few questions related to gaps. I however used the wrong term initially, I didn't mean to create a gapless index, but only a monotonically increasing index. Gaps are not the the problem; gaps getting filled "later" is the problem. I'll use the field monotonic_index from now on.

Regardless, ensuring something similar for JPA/JDBC thus alleviates the concern even further.

However, looking at your scenario, I'd be wondering about three things:

  1. How do you see an implementation of this background process to insert the gapless_index field?

It looks like database servers can't really do this automatically (it would have to be a trigger that runs in a separate transaction, which only some databases support), so the only option I think would be to have a background thread arrange this.

The implementation could be a thread which gets signaled when new events are inserted, with a cooldown period and deadline for when it triggers the monotonic_index update. The cooldown period will help to only do this additional update once for however many events were inserted since the cooldown, and the deadline should ensure that if an instance goes down that inserted events are still detected and updated. The update itself would be something like:

UPDATE events e 
    SET monotonic_index = s.mi 
    FROM (SELECT global_index, nextval('monotonic') AS mi FROM events 
                   WHERE monotonic_index IS NULL ORDER BY global_index) s 
    WHERE s.global_index= e.global_index;

The above UPDATE selects all monotonic_index rows with null, orders them by global_index, and generates a sequence value for each. It then stores that value in monotonic_index.

  1. Would you block Event Processors from reading events that have a null gapless_index`?

Yes, they have to be as the TrackingToken would use the monotonic_index value. They should run when the above UPDATE returns a non-zero result.

  1. Doesn't diverting to a second sequence for the gapless_index still provide the possibility the sequence generator is (accidentally) reused, thus creating gaps?

Yes, I used the wrong terminology; the index doesn't need to be gapless, it only needs to be monotonically increasing. The reason gaps must be tracked is that they might fill up later (become visible later) when a slower transaction commits. However, with the above setup gaps can exist, but will never be filled later.

@smcvb
Copy link
Member

smcvb commented Jun 15, 2023

I did some digging (read: asked the repository owner) whether they had ever considered doing this in the past.
As it turns out, they did for Axon Framework 2 but didn't implement this due to the (expected) performance impact.
Nonetheless, revisiting the idea is, as highlighted in my previous reply, sensible.

However, there's a caveat he added that's worth sharing.
We anticipate this monotonic_index to be unique.
However, as it may be null isn't there a chance some databases would allow only a single entry?

@hjohn
Copy link
Author

hjohn commented Jun 15, 2023

I did some digging (read: asked the repository owner) whether they had ever considered doing this in the past. As it turns out, they did for Axon Framework 2 but didn't implement this due to the (expected) performance impact. Nonetheless, revisiting the idea is, as highlighted in my previous reply, sensible.

However, there's a caveat he added that's worth sharing. We anticipate this monotonic_index to be unique. However, as it may be null isn't there a chance some databases would allow only a single entry?

Yes, you are correct, apparently not all databases allow multiple NULLs in a unique index, and even if they do, some may then exclude it from their index which in turn may lead to bad performance when trying to locate the rows that have a NULL monotonic_index.

I did think of a more complicated alternative, and I was hesitant to suggest it initially. Instead of having an extra column monotonic_index, we could also reuse the global index column for the purpose of creating a monotonic index. This involves splitting up the global index column value range into two distinct parts:

  • The negative range is used for newly inserted rows
  • The positive range is used for the monotonic final values

You'd create two sequences:

  CREATE SEQUENCE global_index_seq INCREMENT BY -1 START WITH -1
  CREATE SEQUENCE monotonic_seq INCREMENT BY 1 START WITH 1

The global_index_seq sequence would be used as the standard sequence when doing inserts. The monotonic sequence would be used for the updates.

The update query would be slightly different:

UPDATE events e 
    SET global_index = s.mi 
    FROM (SELECT global_index, nextval('monotonic_seq') AS mi FROM events 
                   WHERE global_index < 0 ORDER BY global_index DESC) s 
    WHERE s.global_index= e.global_index;

Now, the advantages of this approach are:

  • No extra column
  • No extra index needed, the unique index on global_index will suffice
  • No NULL values

But the obvious disadvantage is the additional complexity involved.

@smcvb
Copy link
Member

smcvb commented Jun 16, 2023

Given the required (and desired) uniqueness of the indices, I think we can conclude the the initial suggestion is not generic enough a solution to provide for Axon Framework.
If you think differently about this, @hjohn, be sure to explain it to us. :-)

Concerning the new proposition:

But the obvious disadvantage is the additional complexity involved.

Although true, this merits investigation.
However, this is not a straightforward operation to initiate.
Nor is it something the Axon Framework team can immediately make time available for.

As such, you're welcome to construct a working sample of the behavior, if you feel the added complexity outways the current existence of gaps (that we aim to abstract away from the user as much as possible).
Note that this doesn't mean we're not willing to investigate; we simply need to manage the time we have to expand on behavior that works.
Furthermore, as the suggested change has a high likelihood of requiring breaking changes or large shifts in behavior based on configuration makes it a tough nut to crack within the lifecycle of Axon Framework 4.
I can imagine we'd (the team) pick this up when the development of Axon Framework 5 is further underway, though.

@hjohn
Copy link
Author

hjohn commented Jun 19, 2023

If you ever decide to want to implement this, I've tested this now (Postgres specific) and it works as expected. The update query needs to be careful to call nextval on the sorted results, so you need another subselect to make it work. I added an advisory lock to prevent two concurrent updates from running, and this results in the final (positive) index value to be gapless and monotonically increasing. The lock should not harm the performance in this case as the update query updates all outstanding events with a negative index; it's only there to prevent doing this work multiple times.

I did some preliminary performance measurements, and I see a degradation of about 5-10% when compared to a version that does not perform these updates. The update query would be triggered immediately (in a separate transaction) after events were appended without any delays. Under high loads (>10000 INSERTs/s) it would update many events in one go and events would be delayed by up to 20 ms. Under low loads the delay was a few ms. This will depend on the latency to the database server of course.

I believe that tracking gaps is not completely free either, so it's possible the performance degradation of doing the above is offset by having to track the gaps.

I've written the above for Postgres, and did not investigate if this could be provided in a database neutral way (Postgres specific parts include the advisory lock, and the nextval function).

@smcvb
Copy link
Member

smcvb commented Jun 19, 2023

That's some amazing effort you've put in, @hjohn.
Thank you very much 💪

This test you've run, is that perhaps a shareable repository for us/others to view?
Although the textual explanation goes a long way, having some actual code to run is beneficial, I think.

I believe that tracking gaps is not completely free either, so it's possible the performance degradation of doing the above is offset by having to track the gaps.

Of course, I agree with you here, @hjohn.
Again, that's in part why when AxonIQ was developing Axon Server it was paramount that the notion of gaps simply didn't occur.

@hjohn
Copy link
Author

hjohn commented Jun 21, 2023

This test you've run, is that perhaps a shareable repository for us/others to view?

I will get back to you on that, I can probably provide a gist.

@smcvb
Copy link
Member

smcvb commented Sep 7, 2023

Hey @hjohn, just checking in to see whether you've had time to provide a gist for us and the team.
No pressure, just trying to gauge the activity here.

@hjohn
Copy link
Author

hjohn commented Sep 8, 2023

The tests I did were not part of Axon, so I don't have anything to show that directly integrates with it. However, we are running it in production now (it's not Axon based, but as I did some Axon work before, I thought you'd be interested in this solution). I also can't share the code directly, as even though I wrote it, it's not mine to share. However, I can provide you with sufficient details if you want to integrate this into Axon.

The solution is more database vendor specific than how Axon currently operates, so in order to support more databases the solution would need to be tailored to fit those. I've only done this for Postgres.

Summary

  • Events are assigned temporary id's that may contain temporary or permanent gaps
    • I've elected to use negative id's for temporary ones and positive for permanent ones, so a single index column suffices
  • A background job is triggered immediately after an event was inserted that converts (all) temporary id's it encounters into permanent ones:
    • It never runs concurrently
    • It starts a separate transaction and acquires a lock (so it can't run concurrently on the same database even when there are multiple JVM's doing appends)
    • After it completes, it will notify any tasks that may be waiting for new events with the highest known (now permanent) index number

Inserting events with temporary id's

In the event table define a column that generates negative id's like this:

index INT8 NOT NULL GENERATED BY DEFAULT AS IDENTITY (INCREMENT BY -1),

The same column can be re-used for the permanent id's (you can also choose to do this with 2 columns). Using negative values for temporary id's and positive ones for permanent id's saves having to have a separate column. Just be careful with queries that look for the highest id; if there are no permanent id's yet, it may return a negative value.

Converting to permanent id's (Finalization)

Define another sequence that is used for the permanent id's. Since it is never accessed concurrently, and only used for already committed data, it will be monotonically increasing and gapless.

CREATE SEQUENCE IF NOT EXISTS monotonic_seq INCREMENT BY 1 CACHE 1

The sequence uses CACHE 1 here explicitly to avoid gaps that could result from the sequence allocating chunks that are only partially used. As the sequence is only accessed by a function running locally on the database, this should have no performance impact.

Then define a function on your database that will update the temporary id's to permanent ones. The function preferably should return the result of its action so you can take appropriate action after it completes. The function defined below will return a tuple containing the number of rows it updated, and the highest permanent index currently in the events table (basically the current value of the monotonic_seq sequence). It returns (-1, -1) if it was unable to lock (which is okay as someone else must be running the update already then), or (0, -1) if there was nothing to update (this can happen if you run the update query periodically in case a JVM went down before it had a chance to start finalization).

It's recommended to do all of this in a single function to avoid paying latency costs for each statement:

CREATE OR REPLACE FUNCTION finalize_indices()
    RETURNS TABLE(updated_rows INT8, last_finalized_index INT8) AS $$
DECLARE
    updated_rows INT8;
BEGIN
    -- ensure this function never runs concurrently:
    IF pg_try_advisory_xact_lock(1) THEN
        UPDATE events e SET index = sub.monotonic
            FROM (
                SELECT index, nextval('monotonic_seq') AS monotonic FROM (
                    -- subquery ensures temporary indices are sorted **before** calling nextval as order is important!
                    SELECT index FROM events WHERE index < 0 ORDER BY index DESC
                ) sortquery
            ) sub
            WHERE sub.index = e.index;

        GET DIAGNOSTICS updated_rows = ROW_COUNT;

        IF updated_rows = 0 THEN
            -- calling currval in a transaction when no nextval was called is not allowed, so:
            RETURN QUERY SELECT CAST(0 AS INT8), CAST(-1 AS INT8);
        ELSE
            RETURN QUERY SELECT updated_rows, currval('monotonic_seq');
        END IF;
    ELSE
        RETURN QUERY SELECT CAST(-1 AS INT8), CAST(-1 AS INT8);
    END IF;
END;
$$ LANGUAGE plpgsql;

This function should be called (in a separate transaction) whenever events were appended. When events are appended slowly, the function will probably run for each event appended, and update a single index. This has some overhead, but if volume is low, there is plenty of time available anyway. When volume is high, this function will still be triggered immediately, but as it won't complete instantly many more events may have been appended while it was running. The next call therefore will finalize many events to permanent indices in one go. This keeps the overhead low even in high volume scenario's.

To ensure the function is always triggered, I'd recommend using a semaphore that is released each time an event is appended (run it after an event, or batch of events, was successfully committed):

semaphore.release(1)

And a background thread that loops and triggers each time the semaphore becomes available:

    for (;;) {
        try {

            /*
            * Try acquire a permit, or give up after 30 seconds to do a "safety" finalization
            */
            semaphore.tryAcquire(30, TimeUnit.SECONDS); 
            semaphore.drainPermits();  // finalization applies to all events, so if there were multiple they can all be drained

            // trigger finalize function here in new transaction

            ...
        }
        catch (InterruptedException e) {
            if (disposed) {
                break;
            }
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }

If you have any further questions, or need some help when you decide to implement this, I'll be happy to help.

Multiple JVM's appending events

This is not really related to the finalizing of indices, but I thought I'd mention it anyway.

When there are multiple JVM's appending events, which also consume their own events, they may not be aware immediately that new events were appended by another JVM. You'd need to poll the database (ugly), using something like LISTEN/NOTIFY (Postgres specific I think) or somehow communicate between the JVM's that events were inserted by another.

In the solution we created we opted for the last option, using Hazelcast to share the latest known index between the various consumers (we also built this for Axon). This avoids any polling or database specific mechanisms, and is very low latency (milliseconds).

@smcvb
Copy link
Member

smcvb commented Sep 15, 2023

Thanks for the detailed explanation, @hjohn.
And, sad to hear you're not using Axon Framework at said project.
Although unrelated to this issue, I'd be curious to hear the decision-making logic around that.

Concerning your approach, the fact database vendor-specific logic needs to be implemented makes it slightly problematic for Axon Framework.
Where possible, we want Axon Framework to rely on "its own," well-known tooling (JPA/JDBC), or the Java language.
Being required to provide vendor-specific logic would simply explode the number of solutions we need to provide and maintain.
Justifying our development time for something like that is...well, tough.
This doesn't mean contributions aren't welcome, of course, but we need to take a stance somewhere.

Nonetheless, as stated before, I still very much value the insights!
I would very much like to take a stab at this in a broader team with our efforts towards building Axon Framework 5.

@hjohn
Copy link
Author

hjohn commented Sep 15, 2023

Thanks for the detailed explanation, @hjohn. And, sad to hear you're not using Axon Framework at said project. Although unrelated to this issue, I'd be curious to hear the decision-making logic around that.

We are, but not in that particular part, it is quite a varied landscape. Main reason we're not using it for that particular part is that these are more external events (always containing a complete aggregate) that will be consumed by all kinds of services, not all of which use Axon, not built by us, or are even in Java.

Concerning your approach, the fact database vendor-specific logic needs to be implemented makes it slightly problematic for Axon Framework. Where possible, we want Axon Framework to rely on "its own," well-known tooling (JPA/JDBC), or the Java language. Being required to provide vendor-specific logic would simply explode the number of solutions we need to provide and maintain. Justifying our development time for something like that is...well, tough. This doesn't mean contributions aren't welcome, of course, but we need to take a stance somewhere.

The solution can be made more vendor neutral I think, this one is just optimized as far as I could get it (for Postgres), avoiding unnecessary round trips. Initially this was built without using a stored procedure, and just a collection of SQL statements. I think offering a neutral solution, and perhaps a few optimized ones (which could be contributed) could still work. In the end, we're just looking to renumber a few negative id's into positive ones without gaps.

Nonetheless, as stated before, I still very much value the insights! I would very much like to take a stab at this in a broader team with our efforts towards building Axon Framework 5.

Yeah, let's see how high it will be on the priority list. I may be able to help out when the time comes.

@smcvb smcvb added Priority 2: Should High priority. Ideally, these issues are part of the release they’re assigned to. and removed Status: Under Discussion Use to signal that the issue in question is being discussed. labels Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority 2: Should High priority. Ideally, these issues are part of the release they’re assigned to. Type: Enhancement Use to signal an issue enhances an already existing feature of the project.
Projects
None yet
Development

No branches or pull requests

2 participants