New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable safe retry of aggregate commands in the event of (perceived) failures #2554
base: master
Are you sure you want to change the base?
Conversation
First and foremost, thanks for making an effort to provide a pull request, @jvanheesch! I have taken some time to go over this pull request and its description. However, as it stands, I do not have sufficient time to do a thorough enough review of the changes you've made. |
You've gone the stretch to construct a pull request with a very thorough explanation (which is very helpful). However, I've just spent some time on your implementation, and I have a recommendation that may solve all predicaments. The only misser left with that suggestion would be the Aside from Now, let me conclude a little:
|
Proposal
This is a draft proposal. It aims to enable safe retry of aggregate commands in the event of (perceived) failures, without changing current default behavior (i.e. opt-in).
Context
NoHandlerForCommandException
). As such, it makes sense to introduce the concept of aRetryScheduler
.COMMAND_TIMEOUT
)The problem and the proposal are demonstrated in the following repository: https://github.com/jvanheesch/axon-command-idempotency
Approach
Message.getIdentifier()
) in the event@CommandHandler
Limitations
This approach cannot be used to make the following CHs idempotent:
I'm not entirely happy about this, but I do believe the benefits largely outweigh these limitations. For CH constructors, this problem can be mitigated by including a
@TargetAggregateIdentifier
in the command message and selecting an event store implementation that enforces(aggregate_identifier, sequence_number)
uniqueness (e.g. when using axon server, duplicate creation commands result inOUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 0 for aggregate a1, expected 1)
.Final thoughts
There is definitely room for improvement. Some current questions/concerns:
AggregateSnapshotter
stores ALL previouscorrelationIds
(for idempotent aggregates), which presumably makes snapshot size unbounded. Is such naive implementation sufficient, or is a more complex strategy (retention policy) required?return null;
the most appropriate way of handling duplicate commands? Would some kind of exception make more sense?idempotent
attribute of the@Aggregate
annotation is not always taken into account. This value should be available to every piece of code that instantiates anAnnotatedAggregate
.AggregateConfigurer
passes this value toEventSourcingRepository
and as such, everyAnnotatedAggregate
returned byEventSourcingRepository
will have the correct value for itsidempotent
property. There are however several other places where anAnnotatedAggregate
is sourced (fromDomainEventStream
or state) that currently do not have access to this newly added property, e.g.:AggregateSnapshotter
does not have access to this value, and resorts to providing the methodshouldAddCorrelationIds()
, enablingSpringAggregateSnapshotter
to mitigate this problemCommandHandlerInvoker.DisruptorRepository
does not have access to this value, and as suchDisruptorCommandBus
will not filter out redeliveriesGenericJpaRepository
does not have access to this value. On top of that, additional changes would be required to take this value into account, asGenericJpaRepository
currently does not store (and hence cannot restore) command ids.idempotent
should be globally configurable, eliminating the need foridempotent = true
in every@Aggregate
annotationI believe safe retries would greatly improve command failure handling, and I hope they will be available some day. I'd be happy to improve upon this draft, but would like to hear your thoughts first.