Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Trigger GraphQL execution immediately on published events (without writing to a store) #133

Open
andyrichardson opened this issue Jan 28, 2021 · 12 comments

Comments

@andyrichardson
Copy link

andyrichardson commented Jan 28, 2021

About

Hey there, first off - thanks for the awesome lib!

I'm working with a team who are currently using a self-made implementation of serverless subscriptions and we'd really like to use this library instead.

One thing that is holding us back right now is the use of polling for events.

Current functionality

So if I'm not mistaken, in the case of a new event being published, the following happens:

  1. A publish event is triggered pubsub.publish('SOME_EVENT')
  2. This event is written to some kind of persistence layer (e.g. MemoryEventStore)
  3. The persistence layer is then polled
  4. Upon polling event, all new events are then sent to an event handler (e.g. MemoryEventProcessor)
  5. Subsequently the event is handed to resolvers and the

Expected functionality

If we're working with push/event based systems, I'm confused as to why events would need to be persisted and polled.

My expectation was that an event publish (1.) would immediately trigger some kind of event handler (4.) without the need for polling or persistence.

@cranberyxl
Copy link

The file you linked to is a test fixture.

Events are picked up based on which type of managers you use. For example, if you're using dynamodb, when something is published, it's added to the table which triggers a call to the lambda letting it know there is a new item in the table: https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/docs/serverless.yml#L58-L65

@andyrichardson
Copy link
Author

Thanks for the response!

The file you linked to is a test fixture

My bad! How does an event handler get called when an event is written to the event store in memory?

it's added to the table which triggers a call to the lambda

Totally, but it looks like this still involves polling under the hood.

Is there a particular reason we use a store for events as opposed to triggering an event handler immediately in the case of a push?

From what I can tell, this is what the project seems to currently be doing

New event -> Write to event store -> Poll event store (e.g. DynamoDB stream) -> Trigger handler

But with a push based workflow, I don't understand the need to write events to a store of any kind if we can instead trigger the handler immediately.

New event -> Trigger handler

The implementation I'm currently using doesn't have the same abstractions as this project but is able to write to dispatch handlers immediately on a new event without the need for an event store.

@cranberyxl
Copy link

This is no polling. Dynamo streams are serverless, see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

A lambda is invoked when something is added to a dynamo table.

@andyrichardson
Copy link
Author

That's the link I shared - see quote below

AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records

@cranberyxl
Copy link

That all happens in the AWS black box, this library doesn't write the code for it.

@andyrichardson
Copy link
Author

I'll rename the issue because this is less about the polling and more about triggering events without needing to write to a store.

So ignoring the polling, I'm wondering - when an event is published, why do we write to a store, which subsequently causes a read, which subsequently calls the same lambda that triggered the write?

@andyrichardson andyrichardson changed the title Feature: Disable polling for events Feature: Trigger GraphQL execution immediately events (without writing to a store) Jan 29, 2021
@andyrichardson andyrichardson changed the title Feature: Trigger GraphQL execution immediately events (without writing to a store) Feature: Trigger GraphQL execution immediately on published events (without writing to a store) Jan 29, 2021
@michalkvasnicak
Copy link
Owner

@andyrichardson in that case you need an event store that performs execution on publish() call.

https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/MemoryEventStore.ts#L10

https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/DynamoDBEventStore.ts#L72

Both event stores just store the event, but in your case you need to combine it with MemoryEventProcessor. So you need new event store, that contains the logic from memory event processor and triggers that logic on publish() call, so you can await the execution.

@andyrichardson
Copy link
Author

Thanks for the response @michalkvasnicak 🙏

So funnily enough, I've been doing exactly that:

  • Use MemoryEventStore + MemoryEventProcessor
  • Use DynamoDB variants of everything else
  • Call dispatch on pubsub

I found that calling dispatch didn't have any effect and acted like a no-op.

There didn't seem to be any attempts to get subscribers from dynamodb following a dispatch.

Once the event is written to the MemoryEventStore, what is the sequence of cascading events that would lead to the memory event processor being called?

The lack of callbacks and push to the event store is what led me to suspect there was a need for polling 🤔

@andyrichardson
Copy link
Author

So you need new event store, that contains the logic from memory event processor and triggers that logic on publish()

Sorry I misread this - so the built in memory event store is working as intended (no dispatch)?

I might be missing something but I'm curious, how come there is a pattern of writing events to a store as opposed to solely consuming events and forwarding them on to the event processor?

I can see why this might be useful for much smaller projects where all published events are exclusive to the service that is consuming them, but for the majority(?) of use cases, messages are likely to be dispatched from external services (AWS SNS/SQS, Kafka, etc)

@michalkvasnicak
Copy link
Owner

michalkvasnicak commented Feb 1, 2021

Memory* parts are not intended to be used in AWS dev, they're used only in local dev mode (so yes they're working as intended). For your use case you need to write new event store by implementing

and also need to copy the logic from MemoryEventProcesor to publish method of your new event processor.

I might be missing something but I'm curious, how come there is a pattern of writing events to a store as opposed to solely consuming events and forwarding them on to the event processor?

I'm not sure whether I understand your question. You can publish your messages from any source that is able to invoke your lambda event processor handler. For example you can use AWS Kinesis, SQS, SNS as the source of your events or you can invoke your lambda directly. So for example you can have some external application that publishes events and your event processor handles them and publishes them to subscribers.

If your question is mainly about why the event is firstly stored to store (for example DynamoDB) and then is asynchronously processed from DynamoDB stream, it's because you can have hundreds of subscribers for an event and you don't want to send messages to them directly because it can cause your lambda to timeout.

@alaycock
Copy link

alaycock commented Feb 1, 2021

I suspect that @michalkvasnicak has answered your question already, but it sounds like the question you're asking is: "Why bother using a datastore for storing and triggering events, when you could just immediately publish them instead?" And the answer is scalability.

  • As the app you are building grows, perhaps you'll want to publish events from other sources, not just your GraphQL lambda. Meaning you'd then need to trigger the events lambda manually from each source rather than simply writing a row to a DB.
  • As michalkvasnicak indicated, doing it in a single lambda execution could cause timeouts, where spreading it out across multiple lambdas will prevent you from running into this limitation.
  • What if your lambda crashes halfway through execution and your memory store is erased, it's much harder to trace which messages were sent/unsent and re-execute them.

@RyanHow
Copy link

RyanHow commented Feb 2, 2021

Hi All!

I know this is diverging off topic a bit. But I'm interested in a similar setup where dynamo is only required to store the subscriptions and not events.

What do you think about async invoke another lambda if you are worried about timeouts/reliability but still want an immediate response?

Then other systems could just do the same, rather than writing to a DB store. (Which I figure it's about the same complexity to call a lambda or write a dynamo record)

Another potential issue (depending on the use case) may be out of order messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants