Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Different event store implementations #58

Open
AlpacaGoesCrazy opened this issue Dec 26, 2019 · 9 comments
Open

[Discussion] Different event store implementations #58

AlpacaGoesCrazy opened this issue Dec 26, 2019 · 9 comments
Labels
question Further information is requested

Comments

@AlpacaGoesCrazy
Copy link
Contributor

Concerns about dynamoDB event store implementation, such as event publisher lambda trigger on each event occurring in events table, made me think about different approaches.

I came up with two options first one SNS+SQS and the second ElastiCache with Redis Pub/Sub.
For both of this options there is a problem on how to communicate published events back to publisher lambda.
For SQS there is an option to set a lambda trigger, but it seems that you can not set it dynamically for newly created queues.
For Redis Pub/Sub it seems there is no option but to have a continuously running server which listens to Redis publish events.

My question is if you know how to solve this problem or maybe there are more possible event store implementations to consider.

@michalkvasnicak michalkvasnicak added the question Further information is requested label Dec 26, 2019
@michalkvasnicak
Copy link
Owner

@AlpacaGoesCrazy I think that @guerrerocarlos mentioned Kinesis as an event store few months ago.

With DynamoDB streams the problem is if you have low traffic meaning that there is only 1 event in a batch. DynamoDB stream can be batched but then you'd need to have for example 100 and more events per second.

Another way is to replace DynamoDB with SQS that is batched too (without SNS) https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

I think this will be limitation of going serverless that you are somehow limited to how to approach event processing. For example you can replace event processing with EC2 instance you just need to feed the event processor with a simulated event from event source.

// example of SQS long polling EC2 server

// event store should be able to run on lambda to register connections
// and subscriptions
const processEvents = createMyEventProcessor({
   connectionManager, // can be dynamodb connection manager
   schema,
   subscriptionManager, // can be dynamodb subscription manager
});

while (true) {
   const { Messages } = await sqs.receiveMessage({
     QueueUrl: // your sqs events queue
     MaxNumberOfMessages: 10,
     WaitTimeSeconds: 20,
   }).promise();

   if (!Messages || Messages.length === 0) {
     continue;
   }

   // depends on events that event processor expects
   // let's say the event processor accepts an array of events 
   await processEvents(Messages.map(message => JSON.stringify(message.Body));
}

@AlpacaGoesCrazy
Copy link
Contributor Author

Waiting for 100 events is not suitable for real time application, you can not even estimate message delivery estimate.
And about using EC2 instance to process events, this might be contrary to main idea of this package: using graphql subscriptions on lambdas with no running instances.

@michalkvasnicak
Copy link
Owner

I don't have any idea how to solve that using event store because we're limited only to event source streams that AWS Lambda actually supports.

SQS and DynamoDB streams are batched by default, so Lambda is already working with batches but the number of events in batch depends on how many events were written to DynamoDB until AWS Lambda polls for changes (from documentation Lambda polls shards in your DynamoDB stream for records at a base rate of 4 times per second. When records are available, Lambda invokes your function and waits for the result. If processing succeeds, Lambda resumes polling until it receives more records.).

You can set up lambda for bigger throughput using partitions and shards. But I'm no AWS expert so maybe there are other patterns that could be incorporated that are outside of this package's scope.

@michalkvasnicak
Copy link
Owner

For example you can invoke your lambda manually using API so that's another way how to implement different event sourcing that is not directly supported by AWS.

@eduard-malakhov
Copy link

@AlpacaGoesCrazy, I've been thinking about the same issue lately and I picked up on your phrase

For both of this options there is a problem on how to communicate published events back to publisher lambda.

Why do we need published events to be communicated back to the publisher? Can you elaborate on this?

@AlpacaGoesCrazy
Copy link
Contributor Author

@eduard-malakhov
The architecture idea of this project is to communicate subscription message from one source to multiple subscribers in a fan-out fashion.
The publisher lambda in dynamoDB implementation is responsible for listening to incoming messages (in ddb Events table) via ddb streams and then sending them to each subscriber.
Thus it is important to communicate published events from the event source (mutation where we call pubSub.publish) to the publisher.

@eduard-malakhov
Copy link

@AlpacaGoesCrazy, I see, but why is this not possible with SNS/SQS? The approach that I had in mind was to publish events/messages to SNS/SQS from mutation resolver and attach a lambda to that topic/queue to parse and fan-out these events/messages appropriately to subscribers. Basically, to replace the creation of event items in DDB with publishing to a topic/queue, and replace the subscription to a DDB event stream with a subscription to a topic/queue. Am I missing any caveats?

@AlpacaGoesCrazy
Copy link
Contributor Author

@eduard-malakhov
I think my original idea was to dynamically create an SQS queue per one subscriber for achieving message delivery guarantee, I am not sure if it will be possible with just one SQS queue for all topics/clients.

@eduard-malakhov
Copy link

@AlpacaGoesCrazy, now I see your point, thanks for sharing. I haven't thought this through entirely, but on the surface, I don't see any limitations for a single topic/queue to work, apart from throughput maybe. As a side note, we've been implementing a chat app based on kafka recently, and we had the same design decision: whether to allocate a topic per client or send all messages via a single topic.

Per our analysis, at least in our use case, a topic per client was a waste of resources because there were not enough messages delivered per client to justify the resources required to maintain a dedicated topic inside kafka. On the other hand, one topic was a single point of failure and could get overloaded under intensive usage. Finally, we ended up with assigning multiple clients per topic and scaling the number of topics to adjust to changing demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants