Sir learn a lot

Lambda GraphQl API with a single table DynamoDb data store for an online learning platform. This project is a proof of concept only and therefore has no unit tests.

References
Prerequisites
Architecture
Environment setup
Local development
AWS commands
Table design
NoSQL single table design theory
DynamoDB tips and tricks
Known issues

References

This work is largely taken from and inspired by the following sources from Rick Houlihan and Alex DeBrie:

Prerequisites

Architecture

A single apollo-server-lambda lambda hosts the GraphQL POST route /graphql (for more details see npm package). A second lambda is listening to the DynamoDB stream in order to atomically update the users aggregate xp score as progress is made in the various chapters as well as pushing all event data to elasticsearch. The majority of the repository code (which is the most complex code) is boilerplate / formulaic so could be converted into its own DynamoDB single table library but this task is deferred for the scope of this POC. The rest of the code is simple types and GraphQL queries and mutation logic constructed using type-graphql then allowing GraphQL and DynamoDB to work their magic! Combined with Lambda this solution represents a high scalable, elastic, dynamic service. Note that there is also a GraphQL express server running.

Environment setup

# use the correct version of node
nvm use

# install dependencies
yarn

# fire up localstack infrastructure (DynamoDB and table definition)
yarn local:up

# create the elasticsearch index
curl -X PUT http://localhost:9200/ddb-index

# seed the database with some random data
yarn seed

# purge all the data from the DynamoDB table
yarn purge

# tear down localstack docker container
yarn local:down

Local development

# start serverless offline lambdas
yarn dev

Lambda GraphQL server is running on http://localhost:9000/graphql Express GraphQL server is running on http://localhost:9001/graphql

Explore the full database content:

query exploreDb {
  getTracks {
    id
    name
    courses {
      id
      name
      chapters {
        id
        content
      }
      enrollments {
        id
        progress {
          xp
          marker
        }
        student {
          id
          firstName
          lastName
          email
          xp
          preferences {
            track {
              id
              name
            }
          }
        }
      }
    }
  }
}

AWS commands

# dynamodb streams
awslocal dynamodbstreams list-streams

# dynamodb
awslocal dynamodb list-tables
awslocal dynamodb scan --table-name sir-learn-a-lot
awslocal dynamodb scan --table-name sir-learn-a-lot --index-name gsi1
awslocal dynamodb scan --table-name sir-learn-a-lot --index-name gsi2

# sqs
awslocal sqs list-queues
awslocal sqs receive-message --queue-url "http://localhost:4566/queue/stream-dlq" --max-number-of-messages 10
awslocal sqs delete-message --queue-url "http://localhost:4566/queue/stream-dlq" --receipt-handle <handle>

Table design

ERD

Key prefixes

c# = course
s# = student
t# = track
e# = enrollment
p# = preference
h# = chapter
g# = progress

Indexes

Generated using NoSQL Workbench. A JSON format export is available here

Access patterns

get all tracks (gsi1)
get track by id
get course by id
get course by track id (gsi1)
get enrollment by id (gsi1)
get enrollments by course
get enrollments by student (gsi2)
get student by id
get student by email (gsi1)
get track preferences by student
get chapters by id (gsi1)
get chapter by id and version (gsi1)
get chapters by course
get progress by enrollment id (gsi1)

NoSQL single table design theory

The single table design patterns used in this project apply to all NoSQL databases and are not specific to DynamoDB. NoSQL is not a replacement for relational models, both still have their place to be used in the correct situations.

When should we use NoSQL?

OLTP
well known data access patterns
need to scale horizontally, e.g. large global access patterns

When should we use SQL ?

OLAP
ad hoc queries

Characteristics of a NoSQL database

de-normalized data does not make efficient use of storage because storage is one of the cheapest data center components
queries minimize and efficiently make use of CPU (no complex joins) because CPU is one of the most expensive data center components
not good at reshaping the data
not suitable for ad hoc queries
consistent, predictable performance at scale
scales horizontally
the data in a NoSQL is still relational (otherwise we would probably just put it in S3!!!)

NoSQL anti patterns

We should not use a NoSQL database with multiple tables to model relational data:

inefficient
- provision multiple tables
- requires multiple reads and multiple writes (high latency)
- requires complex transactional code (available in DynamoDB) and testing to manage insert / updates / delete
requires manually stitching data together at application layer (emulating a relational database)
adds complexity for backups and disaster recovery
complicates local development
does not scale (multiple reads across tables = high latency)
fundamentally not what NoSQL databases were designed for (no joins, no referential integrity, no foreign keys)
just use a relational db!

NoSQL single table design pattern

serverless
cheap
fail fast
prove the application before you pay for it
streams + lambda act as database triggers allowing you to atomically write aggregate data back into the database in a process outside of the database

DynamoDB tips and tricks

always develop locally first using localstack or DynamoDB local. Avoid pain and long feedback loops with outcomes that are tricky to debug
watch the videos in the references section from Rick and Alex multiple times until the penny finally drops!
use NoSQL workbench for modelling data and indexes
don't store large documents
use indexes to replace joins
implement a layering of patterns and add additional indexes when necessary
avoid hot partitions
remember that GSI indexes also consume RCUs / WCUs and are potentially subject to throttling
ensure your indexes distribute the data evenly across the table space
use composite sort keys to model hierarchical relationships
use data pointers to implement number based versioning (see chapter repository implementation and design)
use global tables for world wide domination of your app!
use TTL to clear out stale data
use streams + Lambda to implement database triggers (use an SQS DLQ to manage failed processes otherwise the stream will block infinitely)
use LSI to resort the data within a partition and allow querying across different attributes
only use DAX for read intensive applications
understand how partitions work
use SNS, SQS, Lambda and Dynamo to develop highly performant, scalable, elastic, asynchronous, decoupled micro-services
use elasticsearch with DynamoDB for full text searching
- useful chrome extension for elastic search - elasticvue

Known issues

serverless-offline-dynamodb-streams is not always stable and can crash unexpectedly with Unknown error

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
design		design
infra		infra
src		src
test		test
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
GraphQLMutations.md		GraphQLMutations.md
LICENSE		LICENSE
README.md		README.md
jest.config.js		jest.config.js
package.json		package.json
serverless.yml		serverless.yml
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

License

gary-alway/sir-learn-a-lot

Folders and files

Latest commit

History

Repository files navigation