Skip to content

apollosolutions/observability-workshop

Repository files navigation

GraphQL Observability Workshop

This workshop is designed to demonstrate how to diagnose a poor performing supergraph through the use of observability tooling.

This mock supergraph is a simple example of a blog-style API which exposes a list of posts and associated authors (users).

Prerequisites

To be able to run the workshop, you will need:

MacOS & Linux (incl. WSL)

Once you have the prerequisites installed, you can run the setup.sh script included to fetch the additional dependencies.

What if I want to install them manually?

If you'd like to install those manually, you will need to:

  • Run docker compose pull to pull the associated Docker images
  • Run docker pull grafana/k6:0.46.0 to fetch the required k6 Docker image for load testing
  • Run npm install from the root of the folder to download all required dependencies
  • Download the Apollo Router, as noted on the Apollo Router documentation

Windows

Once you have the prerequisites installed, you can run the setup.ps1 script included to fetch the additional dependencies.

What if I want to install them manually?

If you'd like to install those manually, you will need to:

  • Run docker compose pull to pull the associated Docker images
  • Run docker pull grafana/k6:0.46.0 to fetch the required k6 Docker image for load testing
  • Run npm install from the root of the folder to download all required dependencies
  • Download the Apollo Router using Powershell and extract using tar

Getting oriented

This project consists of a number of distinct services that combine together to make the federated application and associated tooling.

For the supergraph itself, it consists of:

  • A mock REST API hosted on http://localhost:3030, with source code located in the datasource folder
  • Two subgraphs, both within the subgraphs folder:
    • users, which houses user-related data
    • posts, which controls data related to the blog posts
  • An Apollo Router fronting the two subgraphs on http://localhost:4000

And for the observability tooling:

Running the stack

Before running, there's one final step you'll need to take. You'll need to populate the .env.sample with an Apollo key and graphref as noted during the presentation. Once you've filled it out, rename it to just .env. Once you've done this, you'll need to then run publish.sh to publish the schema to Apollo Studio.

MacOS & Linux (incl. WSL)

To run the stack, you will need to run npm run dev from the root of this folder after running setup.sh and publish.sh (or as noted above in the Prerequisites).

What does the script do?
  • Run docker compose up -d to start the observability tooling applications using docker compose
  • Start the router with a config and the associated environment variables
  • Start the subgraphs using those subgraphs' npm run dev commands

Using the singular command is preferable since it will run these all in parallel for you.

Windows

If running on Windows natively (meaning not through WSL), you will need to run npm run dev:windows after running setup.sh and publish.sh (or as noted above in the Prerequisites).

What does the script do?
  • Run docker compose up -d to start the observability tooling applications using docker compose
  • Start the router with a config and the associated environment variables when on Windows
  • Start the subgraphs using those subgraphs' npm run dev commands

Using the singular command is preferable since it will run these all in parallel for you.

Tasks

Important

Please do not modify the datasource code - while it is apparent that there are waits throughout the code, it is also a simple way to show how a downstream service introduces latency, and isn't intended to be something a graph team would traditionally fix.

With that said, you can (and should!) look through the code to see if you can find the source of a few of these issues, including the datasource's.

Issues

We've included a number of issues within the code, and while you're likely to find quite a few of them, we want to outline a few here that you can look for if you're looking for a place to start.

Please note that the resolution may not be something you can do today- just identifying the issue is often sufficient so that another development team can eventually address.

  • One of the fields is causing an error, and the team isn't sure which.
  • The team managing the backing REST API is noting that they are seeing too much traffic, and would like us to optimize the number of requests from both subgraphs
    • There are two places where you can optimize for this
  • The User type takes a long time to resolve and the team isn't sure why.

There are a few other areas to investigate to improve, but these are just a few start points. Note what you've done and we'll discuss the listed three during the specific section.

What should I do?

As mentioned above, this workshop consists of debugging and resolving issues within a poorly performing federated graph. When trying to resolve these issues, you can debug using this flow:

  • Open up Grafana (hosted on http://localhost:3000/) (default username/password are admin/admin) and Jaeger (on http://localhost:16686) to see the current metrics and traces of the application
  • Use k6 to simulate load via running npm run loadtest in another console window to run a short (30 second) load test. If you'd prefer a longer test, feel free to run npm run loadtest:long
  • Review Grafana, Jaeger, and the k6 results to identify problems
    • Grafana includes two prepopulated dashboards; one for k6 results, which include HTTP response times, test results (e.g. percent errors on responses), and throughput, and another for some basic metrics for the router, such as error rates per operation, subgraph response times, and the like
    • Traces can be helpful in determining where time is being spent the most in a given request, so reviewing traces can help provide concrete action items
  • Utilize Apollo Studio for further debugging using the Operations tab to better visualize the error rates and per-field execution times to see if there's a specific field slowing the entire operation

Once you've reviewed and identified items, adjust and re-test. Note which tasks you've addressed and how you identified them for use later.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages