Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the reason behind dynamoDB streams for event store? #86

Open
AlpacaGoesCrazy opened this issue Jun 19, 2020 · 11 comments
Open

What is the reason behind dynamoDB streams for event store? #86

AlpacaGoesCrazy opened this issue Jun 19, 2020 · 11 comments
Labels
question Further information is requested

Comments

@AlpacaGoesCrazy
Copy link
Contributor

So right now EventStore is using dynamoDB table to store incoming events and ddb stream to relay that events one by one to event processor lambda. Basically something like that:
Event is posted by mutation -> Event is written to Events table to dynamoDB -> DynamoDB stream invokes event processor lambda -> Event processor lambda decides to which connections event should be posted and posts it.

My question is why do we need this complex process of relaying events thru dynamoDB table. I can guess that this is some sort of event bus which decouples logic which receives events and logic which publishes them.
However this approach has no architectural benefits as it seems to me.
We do not use dynamoDB stream event batching to process multiple events at the same time, as it would delay events coming thru.
And this system is not acting as a fan-out to subscribers, as the 'fan-out' of the event happens in the event processor lambda.

Wouldn't it be better to directly invoke event processor lambda on incoming event?

@michalkvasnicak
Copy link
Owner

To be honest, this library started just as a proof of concept that GraphQL subscriptions can be done using the new Api Gateway v2. Then I added abstractions so it should be fairly easy to completely change the source of events, etc.

Back in the time when I started with this library, there wasn't an option to use for example SQS as a Lambda source. So I used DynamoDB to guarantee that the event will be processed if something fails during event processing.

Wouldn't it be better to directly invoke event processor lambda on incoming event?

Do you mean to send event "manually" by invoking the lambda with this event? Or do you mean to process the event directly in the same process?

@AlpacaGoesCrazy
Copy link
Contributor Author

AlpacaGoesCrazy commented Jun 19, 2020

Do you mean to send event "manually" by invoking the lambda with this event? Or do you mean to process the event directly in the same process?

I mean invoking lambda with this event. It would be better to separate event processor logic from mutations which publish said event.

Back in the time when I started with this library, there wasn't an option to use for example SQS as a Lambda source. So I used DynamoDB to guarantee that the event will be processed if something fails during event processing.

I am not sure that DynamoDB to guarantee of event processing actually does anything useful.
If event processor times out (due to too many subscribers) event will be processed again and most probably will time out once more getting into infinite loop of posting the same message.
If a connection which we are posting event to is not active it is handled by event processor lambda.
The only useful case for this is if we somehow failed to access subscriber list from dynamoDB table

@michalkvasnicak
Copy link
Owner

I mean invoking lambda with this event. It would be better to separate event processor logic from mutations which publish said event.

I have few questions because I'm not sure I understand.

If you invoke such function with an event, what'll happen if this function fails? How'd you retry the event processing?
How is event processor logic tied to mutation except that you need to publish an event somehow?

@AlpacaGoesCrazy
Copy link
Contributor Author

AlpacaGoesCrazy commented Jun 19, 2020

How is event processor logic tied to mutation except that you need to publish an event somehow?

I think this is the only case

If you invoke such function with an event, what'll happen if this function fails? How'd you retry the event processing?

The direct invocation is not suitable for retrying, which is why it is not the best option.

Here are the reasons which may cause EventProcessor to fail I can think of:

  1. Received invalid event. This probably should not be retried, as we can not recover from it. Discard the event.
  2. Failed to retrieve the list of subscribers for this event. Should be retried.
  3. The subscriber list is too long and we timed out during processing of this list. A tricky one, as we are not tracking to which subscribers we have already sent the event and we can not retry only with those subscribers who failed. But I think the case with too many subscribers should be addressed with different approach to EventProcessor architecture.

@michalkvasnicak
Copy link
Owner

michalkvasnicak commented Jun 20, 2020

I think that DynamoDB store should be eventually discarded and replaced for example by SQS. Because using SQS we can at least control what we'd successfully processed and what failed. What do you think?

In DynamoDB approach we don't have mechanism to do this granular operations because if whole batch fails it's processed again.

The problem in both cases is if you receive an event and then you need to send it to multiple connections and it fails in the middle. It'll cause the batch to be retried because the message is not directly connected to specific connection but to subscriptions. It'd be easier to have an event per subscriber, this way we can rely on queue functionality.

@AlpacaGoesCrazy
Copy link
Contributor Author

The problem in both cases is if you receive an event and then you need to send it to multiple connections and it fails in the middle. It'll cause the batch to be retried because the message is not directly connected to specific connection but to subscriptions. It'd be easier to have an event per subscriber, this way we can rely on queue functionality.

I think this approach would require to have a queue per subscriber which might not be viable for when you have lots of them.

@michalkvasnicak
Copy link
Owner

Queue per subscriber or event per subscriber which is not viable too because you'd need to fetch all the subscriptions for given event and publish the event to all of them. At the moment we have really simple mechanism of subscription tracking which only tracks by event name, so it's not really optimal, it'd be better to have more information stored on subscription so we can easily fetch subscriptions that are relevant.

For example let's say that you're developing a chat app with rooms. Now you subscribe to RECEIVE_MESSAGE but it's too general. You want to subscribe to messages from specific room. So with current implementation you can do this but you need to keep this room in the name of an event. We can store JSON of variables used on subscription like RECEIVE_MESSAGE:{"roomId":"ADADASDAD"}.

This way it's easier to target only subscriptions that are relevant to the event at least from the variables point of view. Still I don't like the idea of fetching all the subscriptions and creating an event for all the connections. And there are still possible edge cases for example there is no way to send the message to freshly subscribed connection (the one that subscribed during the fan out of events to all the connections in subscription).

So the problem is basically in PubSub mechanism that it's not efficient, because we need to keep track of connections, subscriptions and then somehow fan out events. Do you have any idea how it could be solved for Lambda environment? In normal server you have active PubSub connections so it's easier to implement ad hoc queues because you don't need to solve event sourcing but in Lambda we still need to invoke a Lambda function which is something that can be automatically done by DynamoDB, Kinesis, SQS, SNS but the fan out part is problematic (I'm not expert in AWS, maybe there really is a way to do this with but for the last year and half I don't work with anything on AWS so this library is basically just evolved by it's users.)

@AlpacaGoesCrazy
Copy link
Contributor Author

For example let's say that you're developing a chat app with rooms. Now you subscribe to RECEIVE_MESSAGE but it's too general. You want to subscribe to messages from specific room. So with current implementation you can do this but you need to keep this room in the name of an event. We can store JSON of variables used on subscription like RECEIVE_MESSAGE:{"roomId":"ADADASDAD"}.

I am not sure that the way you encode subscription event name would make any difference, whether it be all int the name like RECEIVE_MESSAGE_ROOM_ADADASDAD or some part of it in JSON.

This way it's easier to target only subscriptions that are relevant to the event at least from the variables point of view. Still I don't like the idea of fetching all the subscriptions and creating an event for all the connections.

I think with the current implementation we keep the list of subscribers which are relevant to given event and it works fine on that part

And there are still possible edge cases for example there is no way to send the message to freshly subscribed connection (the one that subscribed during the fan out of events to all the connections in subscription).

Not sure if we can actually do anything in this case, but this seems okay to me.

So the problem is basically in PubSub mechanism that it's not efficient, because we need to keep track of connections, subscriptions and then somehow fan out events. Do you have any idea how it could be solved for Lambda environment? In normal server you have active PubSub connections so it's easier to implement ad hoc queues because you don't need to solve event sourcing but in Lambda we still need to invoke a Lambda function which is something that can be automatically done by DynamoDB, Kinesis, SQS, SNS but the fan out part is problematic (I'm not expert in AWS, maybe there really is a way to do this with but for the last year and half I don't work with anything on AWS so this library is basically just evolved by it's users.)

I think the possible solution here to make these subscriber lists more manageable is to split the EventProcessor functionality:
The first function would be to get subscribers list, split it in batches and send it to processing queue
The second function would be invoked by this processing queue and will send messages to clients, do error handling and if necessary send failed subscriber list back to processing queue.

This way we would have one lambda to do fan out, queue for a retry mechanism and lambdas to handle actual sending.

@michalkvasnicak
Copy link
Owner

I am not sure that the way you encode subscription event name would make any difference, whether it be all int the name like RECEIVE_MESSAGE_ROOM_ADADASDAD or some part of it in JSON.

Yes it doesn't make any difference but the first one (in the name) is basically pushing responsibility to you as a developer and the second could be "automatic".

I think with the current implementation we keep the list of subscribers which are relevant to given event and it works fine on that part

Yes they are relevant to an event so if we solve event targetting (mentioned above) then this one is solved too.

Not sure if we can actually do anything in this case, but this seems okay to me.

I'm not sure either.

I think the possible solution here to make these subscriber lists more manageable is to split the EventProcessor functionality:
The first function would be to get subscribers list, split it in batches and send it to processing queue
The second function would be invoked by this processing queue and will send messages to clients, do error handling and if necessary send failed subscriber list back to processing queue.

Yes I like the idea you proposed only thing is that it's not compatible but that can be addressed for example by introducing new package for this type of event processor. (I was thinking that maybe we should have multiple packages each for different sources, for example redis, dynamodb, sqs, etc).

@AlpacaGoesCrazy
Copy link
Contributor Author

AlpacaGoesCrazy commented Jun 22, 2020

Yes I like the idea you proposed only thing is that it's not compatible but that can be addressed for example by introducing new package for this type of event processor. (I was thinking that maybe we should have multiple packages each for different sources, for example redis, dynamodb, sqs, etc).

You mean not compatible with serverless tamplate? Anyway, the library is currently in alpha version, API changes are to be expected. And if you do not like that developer is required to introduce a bunch of lambda functions in their serverless templates this could be addressed with putting all of our handlers (subscribersHandler, eventProcessor or even webSocketHandler as well) in one lambda and managing event sorces.

And different packages idea is lovely, but I don't think that splitting it to packages is top priority right now

@michalkvasnicak
Copy link
Owner

You mean not compatible with serverless tamplate? Anyway, the library is currently in alpha version, API changes are to be expected. And if you do not like that developer is required to introduce a bunch of lambda functions in their serverless templates this could be addressed with putting all of our handlers (subscribersHandler, eventProcessor or even webSocketHandler as well) in one lambda and managing event sorces.

Yes now as I'm thinking about it, it's not really a breaking change because as you said, it's breaking only how the infrastructure is deployed. In that case we can treat this as new event processor and document how it needs to be deployed in order to work correctly.

So this gives me the idea that only things that are common is websocket and http handlers. The rest is up to event source you choose to use. So basically we'd just need to document each possible event source in its own "manual" and maybe provide some example serverless.yml file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants