Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 172 - GraphQL for GOV.UK #172

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

RFC 172 - GraphQL for GOV.UK #172

wants to merge 5 commits into from

Conversation

richardTowers
Copy link
Contributor

@richardTowers richardTowers commented May 13, 2024

RFC 172 - GraphQL for GOV.UK

Deadline for comments: 31st May 2024

Please leave comments inline in the diff, as these can be threaded and resolved. General comments that might merit a reply should be made as comments on the heading on line 1.

Rendered version

image

@richardTowers richardTowers changed the title Initial commit - pasted from google docs GraphQL for GOV.UK May 13, 2024
@richardTowers richardTowers changed the title GraphQL for GOV.UK RFC 172 - GraphQL for GOV.UK May 13, 2024
@richardTowers richardTowers marked this pull request as ready for review May 15, 2024 11:27
Copy link
Contributor

@sengi sengi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited about this!


</details>

Still, even with dataloader batching queries and individual queries making good use of indexes, we're asking the database to do at least around 10x more work than the current content-store database has to do, and that will have some performance or infrastructure cost implications. We will need to carefully consider how this solution performs at scale to evaluate these.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our database instances are very underutilised right now, so I expect any cost implications would likely be below the noise floor, so to speak.

It's also important to set this valid concern in the wider context of enabling the rationalisation of GOV.UK's database architecture, which has the potential to reduce infrastructure costs and improve performance.

In other words, if the proposal is well executed and the mitigations you suggest are implemented, perceived performance from the end-user perspective is likely to improve while somewhat reducing running costs in the medium to long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 content-store (which I think is the most directly comparable DB to what we'd be trying to do here) hovers around 10-15% CPU, and it's only a db.m6g.2xlarge.

publishing-api on the other hand is a db.m6g.4xlarge, but hovers around 1% CPU utiliization most of the time, with occasional big spikes.

Initially, I expect we'll easily fit within the overhead available on publishing-api's 4xlarge instance. It will be interesting to see what the performance is like during load spikes though (in particular, during big link expansion / republishing events which may acquire locks on tables we need to query).

If we need any extra database infrastructure at all for the graphql API (which we may well not), I imagine we'd be talking about a 4xlarge read replica, which would be ~$17k / year reserved. The long term hope would be that this would allow us to retire several other databases (content-store, draft-content-store, search-api, router), so it would eventually pay for itself. Even if it didn't, that's a small infrastructure cost relative to the lost engineering productivity which I believe the current architecture costs us.


Currently, frontend applications only need to make a single request to content store, and they can then switch on the document type in the response to determine which view to render.

The value of the GraphQL approach is that the frontend can specify the fields it needs for the document type it's rendering. For this to work, the frontend needs to know which document type a particular page has before it can make the query to get the page data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be mitigated by some kind of "standardised fields" option? I can't remember if GraphQL has a way of expressing this. I'm envisioning something like "Let me know what type of content this is, and also return the usual fields returned for this sort of content". I guess a little bit like a prepared statement in SQL. Then the usual fields are the ones required to render that type of page, and we only need to do one call. It does mean that there's a reduction in flexibility (if we want to add additional fields and keep the single call we have to manually tell the API that new fields are needed), but as an option if we really need the speed of just a single call, it's something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you can achieve this with interfaces / unions and fragments. So you could have one giant query like:

query {
  edition(basePath: $basePath) {
    publicLastUpdatedAt
    # ... Other fields that are present on all editions
    
    ... on Guide {
      bodyHtml
      # ...
    }
    ... on Manual {
       parts
       # ...
    }
    ... on MinistersIndex {
       cabinetMinisters { }
       # ...
    }
    # ... ~180 other content types
  }
}

This would be a big request, but we could store it as a persisted query.

Copy link

@DaveLee-GDS DaveLee-GDS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great concept and after various conversation with Richard and others about it, i'm convinced it will be a great enabler for the use-cases documented plus other emergent use-cases yet to be determined.
There are clearly challenges within, but i feel the benefits far outweigh the effort and investment to get it up and running. Top work!

@dgm34
Copy link
Member

dgm34 commented May 23, 2024

Thanks @richardTowers! I'm very much in support of this proposal, but I wouldn't be me if I didn't have a few comments:

  • This RFC - probably by design - doesn't talk about what it would take to achieve this. At some point we'll have to work through the effort required to implement the various parts of this proposal, and get further buy-in from other Publishing stakeholders.
  • I feel quite strongly that the GraphQL API should not be public, at least until we understand a whole lot more about how it performs and impacts our systems. I note that you also think this is inadvisable and I'm reassured by that.
  • It would be interesting to see some 'worst case' render times. I wonder if we can do this using some traffic replay magic, to see how render times vary under real load (subject to trying to correct for things like different environments, etc)
  • I'd really like to get the new Lead TA for Publishing involved in this when he joins. Perhaps he could even own / drive it, working with you of course, in a similar way to how Al took responsibility for the Mongo to PostgreSQL migrations.

Thanks again for this great write up ❤️

Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds really cool, really well explained proposal and lots of cool ideas - I hope we have success with it. Does seem like there are some thorny areas - particularly performance (though I do wonder whether we could utilise a key value store to reduce need to hit postgresql for all the bits)

My biggest concern with this is, given this will take quite a long time to implement and migrate everything, how we minimise the risk of ending up stuck with an incomplete partial implementation if things don't pan out like we hope. We've seen this many times before on GOV.UK and I wonder what insurance steps we could take to de-risk it.


In many cases, edition links are preferable, because they allow users to draft adding / removing / reordering links on a specific edition. With link set links, you can't change the links on a draft without immediately changing the links on the published edition.

The docs suggest that link set links "are typically used for taxonomy based links", but in practice they're used much more widely, including for link types like "documents" and "ordered_roles" which feel like they ought to be edition links. There are around 5 million link set links on GOV.UK, compared to only around 1 million edition links to published editions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sadly a bit of a failing of a "if we we build it they will come" approach, where the edition links solved a few problems but was added as API functionality and not part of a concerted effort to improve consumers to use them - and thus all the work arounds linger on. Probably an interesting lesson for the long tail of adopting this idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Market research? If you build it, they will come. That's my market research.

Yeah. If we'd fully implemented edition links and then done the leg work to migrate everything over, we'd be in a better situation. I think a big part of the reason they're not fully implemented / adopted is the general sense of fear uncertainty and doubt around the link expansion code. Resolving that is the great hope of this RFC. Of course there might also have been a degree of us not following through on technical improvement work, but I can't resolve that issue with an RFC.

I appreciate the concern that GraphQL could also be an idea that only gets half adopted. Maybe it would be healthy for us to plan to fully revert the proof of concept, once we've learned what we're going to learn - I definitely don't think we want to end up with some once hopeful PoC code lingering around used in 5% of places for the next 5 years.

I don't want to grow the scope of this RFC to a full "how are we going to adopt this everywhere" plan, because I think that's jumping the gun. If at the end of the PoC we're all convinced it's a good idea, and something we want to pursue then we should have the discussion about how to make sure it's fully adopted. Or alternatively, we write up what we learned and continue with the current architecture.


A big drawback of using edition links in the current architecture is that they're only expanded at the first level of depth - [We don't support nested edition links][nested-edition-links-not-supported]. This means that if you want a link to appear in a set of expanded links two or more levels deep, you have to use a link set link, and accept that fact that you can't draft links. There's [a ticket in the backlog to address this][nested-edition-links-trello], but because it touches link expansion, it's likely to be tricky to implement.

A graphQL API which expanded links on demand could follow both sets of links. Because it avoids the complexity of doing link expansion ahead of time, this is significantly simpler to implement. This would allow us to use edition links more widely, and resolve many situations where users can't draft links without affecting the live site.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be amazing to get both the link types to have functional parity 🤞


### Initial proof of concept

As an initial proof of concept, we will develop a simple GraphQL API using [GraphQL Ruby][graphql-ruby] inside publishing-api. This API be disabled by a feature flag in production, but in other environments it will run in the same process as publishing-api, and query the primary publishing-api database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to use a modular namespace for this code so it's relatively clear what the code impact is for this?

From a pessimist angle, one of the risks of this idea, particularly when in an exploring stage, is that something unexpected happens and the idea gets caught in a half implemented state but nothing using it. It'd be good to implement with easy exit in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. At the moment my pre-proof-of-concept-proof-of-concept branch keeps almost all the code in app/graphql, so it could be cleany deleted (so long as nothing ends up using it in production).

The only other thing we'd need to add outside of this directory is the new controller.

I'll add to the RFC that we should keep as much of the code as possible in app/graphql


Authentication will use a signon bearer token, as we currently do with publishing-api itself. All users of the API will have full read access - there is no need for Authorization until we introduce draft content.

To allow efficient loading of large numbers of editions, we'll use the [Dataloader][graphql-ruby-dataloader] pattern, which batches queries together to reduce the total number of queries needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that doesn't work out you might want to try (or combine) with the relatively new async queries feature from ActiveRecord: https://www.shakacode.com/blog/rails-7-1-active-record-api-for-general-async-queries/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I hadn't heard about that. That'll be useful in Whitehall too 👀


The GraphQL API proposed in this RFC doesn't depend on link expansion or dependency resolution because it computes links at request time instead of publish time. It doesn't need a central list of link expansion rules, because the frontend can specify the links it needs expanding in the GraphQL query.

Although the concepts of link expansion and dependency resolution will continue to be required until we're able to remove content-store (which is a very long term goal), the GraphQL API could allow us to do more with links because developers can add features without necessarily having to worry about link expansion rules (or link expansion / dependency resolution generally).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional problem you've not flagged here is that link expansion and content store writing makes publishing content an asynchronous eventually consistent task whereas it would ideally be atomic.

This challenge makes it difficult for a publishing application that communicates with the Publishing API to communicate state to a client. For example when you publish a new piece of content we can't tell a user reliably that it is now published as the state the Publishing API communicates isn't true until link expansion occurs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gosh, I didn't know this. I'll add this to the RFC too.


- Aggregations of content, for example [listing governments](#query-to-list-governments) or [listing documents related to a topical event](#query-to-list-documents-related-to-a-topical-event)

The implementation of this API would avoid many of the issues which make GOV.UK's current [publishing-api][publishing-api] / [content-store][content-store] / [search-api-v1][search-api-v1] architecture so complicated. In particular, it would avoid precomputing link expansion at the point when documents are published, and instead collect the links relevant to the client at query time. Similarly, with appropriate indexes in place, the API would cover many (or possibly all) of [the use cases we have for search-api-v1][search-api-v1-use-cases] and its associated Elasticsearch database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assuming any relevancy based use cases are covered by Search API v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handwaving a bit I think 😅 - could do with being more explicit. I think:

  1. Use cases with no query, ordered by newest (e.g. latest pages in a topical event) are clear cut - should be graphql
  2. Use cases with no query, ordered by popularity (e.g. /search/services landing page) are only doable if we move popularity statistics into publishing-api, because if we need to do select ... order by ... limit ... then the database needs to have the field we're ordering on. Personally I'd go for that, but probably a separate discussion.
  3. Use cases with a query, where we're happy with keyword search (rather than semantic search) are only doable if we have some kind of free text search capability in a DB that the graphql API has access to (could still be publishing-api with postgres free text search, or it could be an OpenSearch database or something if we don't mind maintaining another DB)
  4. Use cases that require semantic search / relevance based ordering should be Search API v2


7. Whether pagination is required for nested resources, or whether we can only use pagination for top level queries

8. Whether we need to adjust caching behaviour to lessen the impact of performance degradations (e.g. by using [stale-while-revalidate][stale-while-revalidate])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things other things that'll probably need thinking about are:

Message queue consumers

I imagine there might be an opportunity to send out much smaller notifications and then expect consumers to query Publishing API for more data, rather than the Content Store like object that is broadcast (and is presumably mostly superfluous).

However there does seem to be a problematic area, at the moment by having expanded links in a message queue notification a consumer can make logic decisions based off data from the content item and the content in links. For example: in GOV.UK Chat we don't index content in history mode which involves a property of a piece of content (political) with a property of a link (government, with a boolean of current). Currently it's quite simple to know when this has changed as there is a message that the links of that piece of content have changed, whereas if there wasn't a replacement a consumer application may need to store the link graph itself to know when to act on a linked piece of content changing.

Rendering/handling of Govspeak

Currently as part of the work to downstream content there is the rendering of Govspeak for content that uses this feature, which can be quite a slow task for a large piece of content (I have vauge, potentially inaccurate, memories of this potentially reaching seconds for a very large document). I imagine switching from a Ruby implementation to a C implementation of markdown rendering could speed this up drastically - but carries a lot of migration pain.

The other aspect of Govspeak is a rather quirky Publishing API feature that Content Store conceals. Which is that fields intended for HTML or Govspeak have a type of array which contains objects with a content type. For example a mainstream Answer sends a body field of:

"body": [
  { "content_type": "text/govspeak", "content" => "## Hello" }
]

Which gets broadcast to the message queue and content store as:

"body": [
  { "content_type": "text/govspeak", "content" => "## Hello" },
  { "content_type": "text/html", "content" => "<h2>Hello</h2>" }
]

And then, at runtime, content store mutates it into:

"body": "<h2>Hello</h2>"

I imagine we have to make a decision as to how this type of field would be queried and what data would be served (I imagine we could push markdown rendering down to clients)

Schemas and data validity

I assume we need to keep parity between schemas and a GraphQL object definitions. Is this going to be achievable? I'm guessing in early cases we could even use the current jsonnet schema files to generate the GraphQL object definitions.

One of the long running Publishing API problems with schemas has been that they're enforced only at the time of insert and then if a schema changes later the data in the Publishing API no longer conforms to it. Would GraphQL be able to cope with this relatively gracefully (i.e. ignore superfluous fields and return nil to missing ones) or return errors (which I suppose would enforce validity) ? I guess if it did try to be graceful it'd main almost every field would have to be flagged as being potentially nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message queue consumers

Ah excellent, thanks for the example. I was aware that the eventual plan to remove dependecy resolution was likely to run into some problems like this, but I couldn't think of a clear example. I was most worried about the search indexes (which I guess GOV.UK chat is one of).

Without dependecy resolution happening, it's going to be tricky to keep any of our denormalised views of data synchronised with information in linked documents. This could be a big problem with content reuse too, for example if there's a page that references VAT rate like The VAT rate is {{current_vat_rate}}, and it ends up in search as The VAT rate is 20% and then later current_vat_rate changes, we need to know to update every document in search that references that value.

We'll need to think about how to architect this, because queries to the GraphQL API could potentially follow links that aren't in the expansion rules, so it won't be robust to do dependency resolution like we currently do.

It's tempting to think about GraphQL Subscriptions for this, but I'm not sure they're really the right solution - it feels like the assumption is that clients will be showing a specific page, and they'll use a subscription to do live updates to that page. Whereas we'd want to subscribe to lots of things.

I'll definitely add this to the RFC as something to consider before a wider rollout. We'll need an answer to this question, or really the whole idea is holed below the waterline in my view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendering/handling of Govspeak

This one I was aware of. Currently my poc handles this at request time by checking for text/html, then falling back to text/govspeak and rendering it on the fly.

I've definitely seen the conversion of govspeak to HTML be slow - for example on the ministers index page publishing-api spends about 8 seconds converting govspeak when it's presenting the content to content store. But that's rendering lots of little documents (which the frontend never needs).

I guess for massive govspeak documents, there's a good chance this might be problematically slow. I'll check some.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendering/handling of Govspeak (continued)

For realistically massive govspeak documents, we're talking about ~500ms. The biggest bit of govspeak on GOV.UK at the moment is on https://www.gov.uk/guidance/immigration-rules/immigration-rules-appendix-skilled-occupations. This is pretty complex too - tables with lists inside cells etc.

This takes ~500ms to render on my mac:

edition = Edition.live.find_by(base_path: '/guidance/immigration-rules/immigration-rules-appendix-skilled-occupations')
big_govspeak = edition.details.dig(:body, 0, :content)
Benchmark.bm do |x|
  20.times do
    x.report { Govspeak::Document.new(big_govspeak).to_html; nil }
  end
end
       user     system      total        real
   0.543176   0.011939   0.555115 (  0.559625)
   0.529987   0.012697   0.542684 (  0.544910)
   0.516697   0.011469   0.528166 (  0.529885)
   0.524782   0.012387   0.537169 (  0.539990)
   0.513405   0.011126   0.524531 (  0.526543)
   0.511204   0.012277   0.523481 (  0.525216)
   0.513562   0.011301   0.524863 (  0.526484)
   0.504459   0.009990   0.514449 (  0.515474)
   0.519732   0.011064   0.530796 (  0.532878)
   0.498527   0.010224   0.508751 (  0.509766)
   0.515784   0.012898   0.528682 (  0.530133)
   0.503691   0.010312   0.514003 (  0.515056)
   0.516301   0.010775   0.527076 (  0.528383)
   0.522541   0.010417   0.532958 (  0.534817)
   0.528775   0.011795   0.540570 (  0.542422)
   0.529228   0.013451   0.542679 (  0.544661)
   0.518665   0.010731   0.529396 (  0.531591)
   0.513944   0.011284   0.525228 (  0.526737)
   0.523058   0.011911   0.534969 (  0.536822)
   0.517797   0.011215   0.529012 (  0.530628)

In this particular case, publishing_api has the HTML version as well, so we wouldn't have to do this rendering (my SQL fu isn't strong enough to find documents that have govspeak but not html).

Pages with very long bits of govspeak which aren't pre-rendered to HTML are going to be another worst-case performance situation that we'll have to keep an eye on though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of investigation into how often the body is an array with text/govspeak or text/html vs. a string.

+---------+-----------------+-------------+------+
|body_type|has_govspeak_body|has_html_body|count |
+---------+-----------------+-------------+------+
|string   |false            |false        |474986|
|array    |true             |false        |205506|
|null     |false            |false        |117316|
|array    |true             |true         |2863  |
|array    |false            |true         |1     |
+---------+-----------------+-------------+------+

... so there are >200k pages where we'd need to do this conversion (out of a total of ~800k). There aren't that many pages (<3k) where publishing-api has both types available, but there are lots of pages where body is just a string (which will be HTML by the looks of it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schemas and data validity

I assume we need to keep parity between schemas and a GraphQL object definitions. Is this going to be achievable? I'm guessing in early cases we could even use the current jsonnet schema files to generate the GraphQL object definitions.

Yes, that was my thought too. In the short term we'd keep the current json schemas, and the graphql APIs would have to at least mirror those (maybe with some slight transformations, e.g. always returning HTML from body instead of exposing the govspeak / html choice to clients). A lot of the graphql code could be generated from the schemas, as you say.

One of the long running Publishing API problems with schemas has been that they're enforced only at the time of insert and then if a schema changes later the data in the Publishing API no longer conforms to it. Would GraphQL be able to cope with this relatively gracefully (i.e. ignore superfluous fields and return nil to missing ones) or return errors (which I suppose would enforce validity) ? I guess if it did try to be graceful it'd main almost every field would have to be flagged as being potentially nil.

I'm not planning on changing the way that content is stored, so we'll still have this problem. I'd probably start with making the GraphQL API quite permissive of bad data (i.e. if it finds something in publishing-api's database that's not valid, it should still do its best to render a response, maybe with some errors reported using GraphQL's errors field.

If we wanted to really make sure that things in the database are valid against the schemas, we should probably do that as some kind of batch job / test / database constraint.

I wonder how much of a problem this has been in practice?


### Frontend applications will need to make two API calls instead of one

Currently, frontend applications only need to make a single request to content store, and they can then switch on the document type in the response to determine which view to render.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a bit of squinting you could describe our current architecture as already having this given we have the call to Router to first look up the content destination and then having the application look up the content.

So we could, in someways, handle the first query as an aspect of routing which can also determine which code should be used to render an item. At the moment we have frontend that loads full content items as a means of routing, government frontend that has a single action for every content item and collections that does the routing based on URL structure. If we do further frontend app consolidation we want to standardise on an approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one stage I had a whole section in this RFC about how we could have router make the first query, and then pass the schema / document_type through to the frontend as headers. But then I decided it was too much detail / growing the scope too much.

I totally agree that the first query feels like part of routing, and the second one part of rendering. If we're able to consolidate the frontends on a similar timescale to the graphql rollout, it would be great to just do this in a rails routing constraint. Even if we don't do that though, we could explore doing a "which doc type, schema and rendering app should I use for this page" query in router, and then passing these bits through to the frontend. At that point, router could probably be replaced by an off-the-shelf / open source routing component with a custom plugin to do the graphql query ✨

Comment on lines +326 to +336
- Content Store

- Draft Content Store (instead serving draft frontend requests from the GraphQL API, with appropriate authorization in place)

- Search API V1

- Router / Router API

- Link Expansion

- Dependency Resolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be glorious to see all these things go 🙏

status_last_reviewed: 2024-05-13
---

# GraphQL for GOV.UK
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgm34 - responding to your comments as an inline comment, so it can be threaded 🧵

  • This RFC - probably by design - doesn't talk about what it would take to achieve this. At some point we'll have to work through the effort required to implement the various parts of this proposal, and get further buy-in from other Publishing stakeholders.

Yeah, that's mostly by design. I think the initial proof of concept work can probably be justified and a place found for it in the publishing-platform team's roadmap. Building a production ready solution which will work at GOV.UK scale is going to require more thought, and I don't have a good feel for how much work that's actually going to be (hopefully the PoC should resolve that though). Also as Kevin points out, building the API is one thing, actually adopting it in all of the frontend code is probably the bigger bit of work (and if we don't complete that, we leave ourselves in a worse position than when we started).

  • I feel quite strongly that the GraphQL API should not be public, at least until we understand a whole lot more about how it performs and impacts our systems. I note that you also think this is inadvisable and I'm reassured by that.

Yeah, I'll add that to the RFC - I think allowing anyone external to have access to the API is at the very least a problem for after the PoC.

  • It would be interesting to see some 'worst case' render times. I wonder if we can do this using some traffic replay magic, to see how render times vary under real load (subject to trying to correct for things like different environments, etc)

I'll add some examples of pages which I expect to be worst cases to the RFC, and we can evaluate them once the PoC is in integration. The possibly pathological cases I'm aware of are (1) pages with a lot of links, like the minsiters index page (2) pages with a lot of govspeak which needs transforming into HTML (3) pages which conduct several searches (e.g. topical events, which list recently published content tagged to the event) (4) simple smart answers (which might not be slow to query, but are certainly complex to express).

  • I'd really like to get the new Lead TA for Publishing involved in this when he joins. Perhaps he could even own / drive it, working with you of course, in a similar way to how Al took responsibility for the Mongo to PostgreSQL migrations.

Absolutely - it would be absolutely great to have a TA to work through this with.

Thanks again for this great write up ❤️

🙇🏻 you're very welcome

@nacnudus
Copy link

I'm unfortunately unable to comment inline (as a GOV.UK exile).

  1. This is a great read. Thank you for putting the work into it. It would be wonderful to make the structured data more accessible.
  2. The benefits are arguably about custom, on-demand link expansion. GraphQL is arguably of secondary importance. I'm sure GraphQL is the right choice, but it isn't the only one.
  3. Could Govspeak be rendered once when it's received by the Publishing API, rather than every time it is queried? I vaguely recall that this was once on a roadmap. Another benefit of doing it only in the Publishing API, rather than upstream or downstream, would be that exactly one version of Govspeak would be used.
  4. I have an awful feeling that when body is a string, it's sometimes govspeak rather than HTML. I think GovGraph takes the cautious approach of running everything through govspeak rendering, just in case.
  5. Public access to a GraphQL API would spare departments from having to scrape the Content API, which I keep hearing about them doing (or considering doing).
  6. Querying inside the details column of the Publishing API is very expensive, because that's where the body text is, but it's also where a lot of other useful data is. Separating those things might help with performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants