Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Feature: GraphQL schema snapshots for all data sources to solve undefined/empty data issues #3344

Closed
jsanchez034 opened this issue Dec 27, 2017 · 95 comments · Fixed by #16291
Labels
type: question or discussion Issue discussing or asking a question about Gatsby

Comments

@jsanchez034
Copy link

Description

Currently there are issues with GraphQL schemas produced from data sources where at the moment gatsby develop or gatsby build is executed the data shape is incomplete, parts are in an empty state or in a different type than they would be if the source data was filled out. Below are a few example issues..

It would be great if as the data shape evolves on the data source side, you could create test pages or pieces of content that are fully filled out, meaning no empty fields. Then on the Gatsby side you could run a cli command called something like gatsby snapshot-schemas which would fetch the current data sources, run the source data through there regular plugin data normalization paths, run the data through the existing infer GraphQL schema code and then finally at the end take the schemas generated and save them off to a folder in /src called schemas.

On subsequent builds Gatsby could skip over inferring of the GraphQL schemas when it sees schemas defined in /src/schemas. These schema snapshots could then committed into a sites repo and allow for data shape changes to require new snapshots instead of just data source changes. These schema snapshots open up many possibilities such as the validation of incoming data shape changes. Schema snapshot diffs could be shown in Gatsby CLI as well when gatsby snapshot-schemas is ran again once initial schemas have been saved.

I would love some feedback on the idea itself from the contributors of the various source plugins. If the idea passes the smell test, I would like some feedback on what format the snapshots should be saved to. Maybe the GraphQL schema language using something like gestalt-graphql would be nice.

@m4rrc0
Copy link
Contributor

m4rrc0 commented Dec 27, 2017

It seems like an interesting idea since it should work regardless of the source. It would definitely work for my usecase.
While I am pretty confident about simple fields, I am wondering how it would work for assets though.

@jsanchez034
Copy link
Author

Good question about assets. I think for assets most data sources represent them as a string with a URL or path to the asset. The schema snapshot would just represent the type as a string with the URL/path or an asset object which has a string property with the URL/path.

@jquense
Copy link
Contributor

jquense commented Dec 27, 2017

Seems like an interesting idea, though why not skip the extra indirection of a snapshot and and define the schema fully? Like if we're going go a head and make fully complete data example, it seems like not much more work to directly specify the gql schema while also being clearer?

@KyleAMathews
Copy link
Contributor

@jquense that's what I'm thinking. You could either run a command to "snapshot" the dynamically generated schema which would write out a file with the schema in normal graphql form or you could directly write the same form yourself.

@jsanchez034
Copy link
Author

Yeah I definitely like the idea of keeping it open to write your schema from scratch along side the "snapshot" feature. The benefits I see from the "snapshot" feature is the ability to ease yourself into learning the GraphQL schema syntax and reducing the amount of boilerplate code needed to get up and running with Gatsby and any source plugin.

I am currently looking into the internals of Gatsby's schema inferring logic in ..
https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/schema
And getting familiar with how GraphQL schemas are defined. I will try to create a proof on concept PR soon.

@jquense
Copy link
Contributor

jquense commented Dec 28, 2017

I totally understand easing users into a new thing, however I think plugins are a bit different. Authors are generally expected to understand gql already and users shouldn't generally have to mess with anything other than querying when consuming a source plugin.

That said tho the schema language for defining said schema is probably less complicated than querying, in terms of getting up to speed. I think it'd be time better spent allowing plugin authors to provide a .graphql file than working on snapshot ting a comprehensive data object, which may not even be enough to properly define the schema wholly.

I would like to mention to that manually defining the schema is already possible and supported via the gatsby api, albeit with graphql-js not the terser schema language.

@jsanchez034
Copy link
Author

jsanchez034 commented Dec 28, 2017 via email

@jquense
Copy link
Contributor

jquense commented Dec 28, 2017

So I think we are missing each other on where this should be solved :) in the WordPress example, I think rather than the plugin completely relying schema inference it should fully specify fields that may exist, e.g. all the nullable fields. Then a user won't run into the problem at all, because gatsby will have a complete schema for the datasource.

In cases where the plugin author can't possibly know what all the fields are I still think that, while more of a learning curve, it'll end up being more accurate and less error prone for the user to provide the rough schema vs fake data to be inferred as a schema. Yes you need to learn more up front (and we should provide good documentation for it) but you avoid users needing to understand the internal workings of how gatsby infers schemas. For example, if a user wants to specify a date field, it's (I think) simpler to learn to directly specify a date vs know to provide an iso compatible string that will be inferred as a date. Plus a lot of other features like connecting two data nodes, or specifying nullablility, is only really doable with a schema definition vs a data object

@m4rrc0
Copy link
Contributor

m4rrc0 commented Dec 28, 2017

What about 2 different plugins? One that allows us to easily specify the schema in a file and another one to generate that file from a specific source.
So the source might be different... For example a contentful space where every field is complete might be the source to generate the schema. Then the schema can be used on any contentful space that has the same architecture. And of course we might still tweak the schema manually if needed.
In any case it seems like plugin no1 is the first step. We might still decide when it is done what to do next.
@jquense is there somewhere an example of a manually defined schema? I didn't know it was possible.

@jsanchez034
Copy link
Author

jsanchez034 commented Dec 28, 2017 via email

@jquense
Copy link
Contributor

jquense commented Dec 28, 2017

Yeah I think you are right it doesn't all overwriting :/

I'm generally in favor of plugins to solve issues but I think this sort of thing may need to handled in the core api to strike the right usability notes (I could totally be wrong tho). In my head the ideal api would be just to point gatsby at a .graphql file and it does the rest, but that may not as nice as I imagine. I think it's all probably worth exploring in plugins tho if possible before possibly moving into core... I of course defer to what Kyle thinks since he's the boss :)

@jsanchez034
Copy link
Author

jsanchez034 commented Dec 28, 2017 via email

@pieh
Copy link
Contributor

pieh commented Jan 6, 2018

@jsanchez034 did you look more into it? I will propably try to work on this next week and I'm interested if you did some research and can share your findings

@jsanchez034
Copy link
Author

jsanchez034 commented Jan 6, 2018 via email

@jsanchez034
Copy link
Author

jsanchez034 commented Jan 6, 2018

Hey @pieh, I started looking into how we can inject whole new schemas into the schema that Gatsby infers from the source data. So I figured a plugin could be created to add schemas to Gatsby. Below are my notes for this plugin...

Schema writer plugin

What should the plugin do?

  • Take a schema and resolver and the resulting schema should be stitched into main Gatsby schema
    • Looks in src/schema for folders, each folder has a schema and resolver file
    • Plugin will take all files and run them through makeExecutableSchema of the graphql-tool npm package
    • Create a new Gatsby Node API function called something like mergeSchema that gets called before this line in the Schema creation file which uses graphql-tool Schema stitching | GraphQL Tools to stitch Gatsby produced schema with manually created schema from src/schema
  • Should set some dummy sample data for the new schema using createNode

Those are my rough notes on a schema writer plugin. The other plugin that @MarcCoet mentioned would be one where you could point the plugin to endpoints on a domain where data is set up to be in a complete state then schemas would be generated and saved off to src/schema.

Note: Schemas would be defined using GraphQL type language

@pieh
Copy link
Contributor

pieh commented Jan 7, 2018

Unfortunately I don't think graphql-tools will work here as it was meant to merge full schemas (that is both definitions and data/resolvers) and adding links between those schemas.

From documentation of mergeSchemas function (https://www.apollographql.com/docs/graphql-tools/schema-stitching.html#mergeSchemas) about passed schemas to merge:

schemas is an array of either GraphQLSchema objects or strings. For strings, only extend type declarations will be used. Passing strings is useful to add fields to existing types to link schemas together, as described in the example above.

We would rather want to use string option, as GraphQLSchema require to define at least 1 query (or it throw an error), but extending type doesn't allow to overwrite field type. Error is thrown when trying to overwrite existing field (conflict resolution function isn't called so nothing we can do about it unfortunately):

Field "frontmatter.title" already exists in the schema. It cannot also be defined in this type extension

As for second option (using GraphQLSchema) I "hacked" it to work by adding dummy query, but initial problem is I can't use types from "main" schema - we can't simply do this:

type frontmatter {
  featuredImage: File
  title: String
}

because File isn't defined in this additional schema. We could probably append builtin types so it would work but I feel like trying to "fool" this tool to do stuff it's not designed to do (at least at this point in time) is not the way to go.

I'll will try to evaluate 2 other approaches:

  • search for or make a tool designed to modify already existing schema (that would be more generic graphql tool)
  • or force user/plugin defined types on fields when creating schema (that would gatsby specific)

@pieh
Copy link
Contributor

pieh commented Jan 23, 2018

I've made some progress. Decided to go with gatsby specific approach for now (adding support for "forced types" in https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/schema ) as I found it easier to work with than playing with graphql schema objects. I'll try to clean up my code a bit so it would be more readable and create pull request this week to discuss it further there.

Just little sneak peak of results (imo quite promising):
I've used https://github.com/gatsbyjs/gatsby-starter-blog and used frontmatter in posts to emulate some problems that can cause build errors caused by queries that don't match schema:

// post #1
---
title: Hello World
date: "2015-05-01T22:12:03.284Z"
featuredImage: "sasalty_egg.jpg" // badly formatted value that wouldn't allow to make this File type 
test: "some string" // incompatible value type with other posts (in other posts test field is object) - it would be skipped in schema
---

// post #2
---
title: My Second Post!
date: "2015-05-06T23:46:37.121Z"
test:
  floatField: 5.5
  intField: 99
  arrayOfStringsField:
    - "it"
    - "works"
---

// post #3
---
title: New Beginnings
date: "2015-05-28T22:40:32.169Z"
test:
  stringField: "string"
  boolField: true
---

Used type definitions:

type MarkdownRemark {
  frontmatter: frontmatter_forced_type
}

type frontmatter_forced_type {
  featuredImage: File
  title: ImageSharp
  test: TotallyCustomType
}

type TotallyCustomType {
  stringField: String
  floatField: Float
  intField: Int
  boolField: Boolean
  arrayOfStringsField: [String]
}

Note: those don't have to be full type definitions (f.e. in MarkdownRemark definition I just force type of frontmatter field to not rely on automatic type naming)

Here's how frontmatter looks before using my changes (no test field, "bad" type of featuredImage field):
before

And after:
after

@jsanchez034
Copy link
Author

Hi @pieh! Thanks for taking this on, your solution looks very promising! Sorry I couldn't contribute more to this issue in the past couple of weeks, I haven't had the time to dive into this. Let me know if there is anything I can help with.

@pieh
Copy link
Contributor

pieh commented Jan 24, 2018

If anyone is interested you can view my code changes master...pieh:schema_wip (after deleting some dead code and refactoring some stuff I had yesterday it's much less code than I thought :) ).

Don't have time to put PR for comments with good description and website repository used for testing, so figured I would just share my code for now. If anyone want to play with it - place gatsby-schema.gql file with schema definition in root directory of your project (along gatsby-node.js, etc.).

@m4rrc0
Copy link
Contributor

m4rrc0 commented Feb 25, 2018

Any chance this could make its way into a PR?
A lot of people would be glad to have this I think.

@pieh
Copy link
Contributor

pieh commented Feb 25, 2018

@MarcCoet more I dived into this code more I realized that my approach in that branch can be used only as proof of concept. I plan on working on this to have it done proper way but this a lot bigger task than I originally anticipated (as I was learning more about how currently schema creation is done and what features it offers I didn't earlier know about as I was coding stuff)

So currently schema creation looks like this:
current

Implementation is doubled for each source of types. My proof of concept (branch I linked) adds just another source and now we have it tripled:
poc

It would be unmaintainable to just add this in current form.

What I want to do is so this looks something like that:
proposed
But this require some major refactoring/rework as current schema creation is designed with inferring types from data as only source of data. After that, adding another source would be trivial with knowledge I gained during development of my proof of concept.

I want to create RFC issue for this (I created flowcharts earlier this week :) ). Just need to get some spare time to actually commit to discussion this would start.

@HZSamir
Copy link

HZSamir commented Feb 26, 2018

@MarcCoet I personally have moved all my projects temporarily to react-static, as this feature is essentially a deal-breaker for me.
But I plan to transition back to Gatsby the second this is fixed.

@gatsbot
Copy link

gatsbot bot commented May 13, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed May 13, 2019
@silviopaganini
Copy link

am I the only one still having this problem?

@KyleAMathews KyleAMathews reopened this Jun 24, 2019
@KyleAMathews
Copy link
Contributor

Sorry we're working towards this. I'll set this as not stale so the bot stops closing it.

@moreguppy
Copy link

Im running into an issue when using the new schema customisation with gatsby-source-contentful.
Defining a type where one of the fields points to a reference array always returns null
Ie:

type ContentfulPage implements Node {
   editorTitle: String
   title: String
   slug: String
   sections: [ContentfulSection] // im always null 
}

@sami616 I found this approach worked:

// In gatsby-node.js
exports.sourceNodes = ({ actions }) => {
  const { createTypes } = actions
  const typeDefs = `
    type ContentfulPage implements Node {
      sections: ContentfulSection
    }
    type ContentfulSection implements Node {
      title: String
    }
  `
  createTypes(typeDefs)
}

It seems for reference fields, you have to define the shape of the model you are referencing as well

@silviopaganini I found this is a pretty good solution in the interim

@silviopaganini
Copy link

Yeah! trying this now, seems to work, but it's not scalable... I'm working on a huge website, if I have to create those for all non-required fields is not great, but works for now

@miraclemaker
Copy link

miraclemaker commented Jun 25, 2019

@moreguppy I'm trying this to fix a similar issue with gatsby-source-wordpress, but I get the following message when I add this code to gatsby-node.js:

Error: Schema must contain uniquely named types but contains multiple types named "wordpress__PAGEAcf".

My code:

exports.sourceNodes = ({ actions }) => {
  const { createTypes } = actions
  const typeDefs = `
    type wordpress__PAGEAcf implements Node {
      enter_content: String
    }
  `
  createTypes(typeDefs)
}

@miraclemaker
Copy link

miraclemaker commented Jun 25, 2019

Nevermind, for anyone experiencing a similar issue, this is the correct format ti fix it:

exports.sourceNodes = ({ actions }) => {
  const { createTypes } = actions
  const typeDefs = `
    type wordpress__PAGE implements Node {
      acf: wordpress__PAGEAcf
    }
    type wordpress__PAGEAcf implements Node {
      enter_content: String
    }
  `
  createTypes(typeDefs)
}

@gatsbot
Copy link

gatsbot bot commented Jul 13, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed Jul 13, 2019
@kalinchernev kalinchernev added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Jul 29, 2019
@kalinchernev
Copy link
Contributor

Re-opening and adding a not stale tag as of #3344 (comment)

@stefanprobst
Copy link
Contributor

I'd be super grateful if someone could give the stuff in #16291 some real-world testing. Thanks!

@stefanprobst
Copy link
Contributor

We have a open PR for schema lock-down in #16291. Would be cool if those still interested in seeing this land could share some feedback. Thanks!

@m4rrc0
Copy link
Contributor

m4rrc0 commented Sep 15, 2019

Hey @stefanprobst . Awesome work!
Sorry for the late reply. This has been stalling for so long that I got used to the workarounds and it was not a major problem anymore to me. It is so neat though that this is moving forward! :)
I gave it a quick try and it worked seamlessly! That is rare enough to salute. ;D
I am working on a big update on a project and I will have some more 'real world' testing in the coming weeks. I'll comment further in the PR.
Thanks a lot for the work!

@m4rrc0
Copy link
Contributor

m4rrc0 commented Nov 1, 2019

@stefanprobst any idea about #19210

@wu-lee
Copy link

wu-lee commented Dec 8, 2021

Hello - I'm also having trouble with this still.

I've attempted to fix it as documented here:

https://www.gatsbyjs.com/docs/reference/config-files/gatsby-node/#createSchemaCustomization

And:

https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions

However, first I wanted to ask, why is sourceNode is being used in the post above instead of createSchemaCustomization as in the docs? Does it make any difference?

Nevermind, for anyone experiencing a similar issue, this is the correct format ti fix it:

exports.sourceNodes = ({ actions }) => {
  const { createTypes } = actions
  const typeDefs = `
    type wordpress__PAGE implements Node {
      acf: wordpress__PAGEAcf
    }
    type wordpress__PAGEAcf implements Node {
      enter_content: String
    }
  `
  createTypes(typeDefs)
}

In my case I seem to have extra problems. Although I explicitly define fields which can be optional, in order to avoid errors if Gatsby does not find any examples and therefore can't infer their existence, File fields seem to be a problem, as I've not discovered how to define these correctly.

Simply specifying the type as (for example) hero_image: File does not work reliably, as Gatsby doesn't seem to be doing the obvious thing and defining file fields like this when the field refers to an existing file. There's a childmageSharp field the templates need, which seems to function ok when the hero_image field is inferred by Gatsby, but not when I add an explicit type File, when I get warnings that "You can't use childImageSharp together with undefined.undefined — use publicURL instead", and the images disappear.

I'm using a headless CMS to allow my users to enter content, so I can't control what they do exactly, and failures crop up unpredictably depending on what they've created, are not in the least friendly to users or developers.

I find myself caught between these cases. I am encountering the following:

  • errors complaining that File fields are objects, (and so need at least one subfield in the type spec)
  • errors due to the fields being interpreted as strings which cannot have subfields (which I think happens when the field refers to a non-existent file)...
  • or simply errors because the field is missing...

What happens when inference is used all seems very dependent on the first content file Gatsby sees, which isn't something I know how to fix, the order seems non-deterministic. An apparent fix can later turn out to be working only by lucky ordering.

I am currently using Gatsby 2.23.12, not the most recent version as this project is over a year old and still intermittently encountering problems. Upgrading seems to be another can of breaking-change worms I don't yet want to open.

wu-lee pushed a commit to TransitionbyDesign/homemaker that referenced this issue Dec 8, 2021
I'm being hopeful here, I think...  has failed in the past, see
comments in the file.

Posted in a related bug here:

gatsbyjs/gatsby#3344 (comment)
@fgroenendijk
Copy link

fgroenendijk commented Jun 15, 2023

My team had a problem inferring complex types like dynamic zones and some fields that could be empty.
Then I found out about the command printTypeDefinitions.
When you first populate all the fields once, then let printTypeDefinitions create a typeDefs file.
After this you can copy and paste the problem cases like for example wordpress__PAGE in the typedef.

Example code:

import * as fs from 'fs';
exports.sourceNodes = ({ actions }) => {
	if (fs.existsSync('./typeDefs.txt')) {
		fs.rmSync('./typeDefs.txt');
	}
	actions.printTypeDefinitions({ path: './typeDefs.txt' });

	const { createTypes } = actions;
	const typeDefs = '';
	createTypes(typeDefs);
}

Used with gatsby 5.7.0.

@tractorcow
Copy link

tractorcow commented Mar 18, 2024

@fgroenendijk this is the solution we used. It could be better, but it's setup to refresh the schema during development, and rely on this cached schema on deployed environments. It still relies on our non-master branch having all content fields filled, but it won't break production.

// Only rebuild definitions on non-master branches, and not on codebuild
const rebuildDefinitions = process.env.GATSBY_CONTENTFUL_ENVIRONMENT !== 'master' && !process.env.CODEBUILD_BUILD_ARN
const definitionPath = 'src/type-definitions.gql'
const schemaPath = path.resolve(__dirname, definitionPath)

export const onPreInit: GatsbyNode['onPreInit'] = async () => {
  // On non-production environments clear outdated type definitions to force a resync
  if (rebuildDefinitions && fs.existsSync(schemaPath)) {
    console.log(`Deleting types from ${definitionPath}`)
    fs.unlinkSync(schemaPath)
  }
}

export const createSchemaCustomization: GatsbyNode['createSchemaCustomization'] = async ({ actions }) => {
  const { createTypes, printTypeDefinitions } = actions

  // ON UAT this will save type definitions to a file
  if (rebuildDefinitions) {
    // On non-master branch we persist types
    console.log(`Saving types to ${definitionPath}`)
    printTypeDefinitions({ path: definitionPath, withFieldTypes: true })
  } else {
    // Otherwise we restore types
    console.log(`Restoring types from ${definitionPath}`)
    const typeDefs = fs.readFileSync(schemaPath, 'utf8')
    createTypes(typeDefs)
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question or discussion Issue discussing or asking a question about Gatsby
Projects
None yet
Development

Successfully merging a pull request may close this issue.