Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-add GitHubBridge to support self-hosting with separate content repos #4384

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

coreyaus
Copy link
Contributor

@coreyaus coreyaus commented Dec 1, 2023

TinaCMS GitHub Bridge

The GitHubBridge is intended to support self-hosting with separate content repositories (e.g. one central code repository that is able to build multiple sites

This will also help provide a path to better support reindexing for self-hosted projects (i.e. keeping the data layer updated when changes are pushed directly via git, rather than made through the Tina UI)

Background

Additional notes

  • One important caveat is that this approach could lead to GitHub rate limiting errors on certain projects, particularly larger sites with lots of pages/files
  • The ability to pass tinaFilesConfig is intended to support multi-tenant projects where the Tina folder lives in the code repository so the content repositories do not need to contain any Tina folders or files at all. In this case the GitHubBridge will defer to the FilesystemBridge to retrieve the auto-generated Tina files, so those files do not need to be committed or tracked in the content repositories. This is particularly useful when multiple content repositories/sites are built from a central code repository.
  • Regarding the implementation, I've added the new GitHubBridge to the tinacms-gitprovider-github package given it already has @octokit/rest installed, and to avoid a proliferation of separate GitHub-related packages in the tinacms monorepo. Let me know if you'd prefer a separate package just for the GitHubBridge and I can restructure these updates.

Documentation

See the new sections added to the latest version of the README for the tinacms-gitprovider-github package here: https://github.com/tinacms/tinacms/blob/2ddccfa4969c183c15a2647bff3d4dd57aa7b141/packages/tinacms-gitprovider-github/README.md

- One important caveat is that this approach could lead to GitHub rate limiting errors on certain projects, particularly larger sites with lots of pages/files
- The ability to pass `systemFiles` to the `GithubBridge` is intended to allow multi-tenant projects where the content repos do not need to contain any Tina folders or files at all. The code repository can pass the necessary Tina system files to the GitHub bridge at built time, and no Tina files need to be committed or tracked in the content repositories. This is particularly useful when multiple content repositories/sites are built from a central code repository
@coreyaus coreyaus requested a review from a team as a code owner December 1, 2023 13:01
Copy link

changeset-bot bot commented Dec 1, 2023

🦋 Changeset detected

Latest commit: 2ddccfa

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
tinacms-gitprovider-github Minor
@tinacms/graphql Minor
@tinacms/cli Minor
@tinacms/self-hosted-starter Patch
@tinacms/datalayer Patch
@tinacms/search Patch
starter-basic-iframe Patch
starter-empty Patch
e2e-next Patch
kitchen-sink-starter Patch
@tinacms/starter Patch
tinacms-authjs Patch
tinacms Patch
next-tinacms-cloudinary Patch
next-tinacms-dos Patch
next-tinacms-s3 Patch
tinacms-clerk Patch
@tinacms/app Patch
@tinacms/vercel-previews Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

This updates the GitHubBridge to extend the FilesystemBridge and optionally fallback to the latter when fetching Tina files. This provides a path to support self-hosting with separate content repos without needing those separate content repos to track all the Tina files from the code repository. We can instead specifically opt to use the Tina files from the local filesystem in the code repository.
@vrish88
Copy link

vrish88 commented Dec 31, 2023

FWIW I've pulled the GithubBridge into my own project because I'm trying to self host with a separate content repo. Here are some notes:

  • With the implementation from the PR, I had to manually trigger an index with the following:

    // database.js file
    import schema from './__generated__/_schema.json'
    // ... 
    if (!isLocal) {
      database._indexAllContent(levelAdapter, schema).then(r => console.log('DONE INDEXING', r))
    }
  • I've got 6k+ md files in my content repo and so I hit the Github rate limit pretty quick. It seems (and correct me if I'm wrong) that the bridge is only used to retrieve the current state of the file. Is there an alternative so that we could instead read from the file system? Locally, I'm able to use the localContentPath config property to configure where my repo is. Now this might by naive, but would it be sufficient to pull down the content git repo where the code is deployed, and read from that?

  • I love the idea of keeping the content repo free of tina artifacts! (no offense :))

@coreyaus
Copy link
Contributor Author

Yeah, all good points. I'll add my 2 cents on each one but definitely don't have all the right answers myself either! Incidentally I'm not sure whether the Tina team are likely to consider including this PR in the core repo (and definitely fine either way). I partly opened it simply to share the code and continue the discussion - happily it's easy to just copy the GitHubBridge.ts into your own project and install the dependencies so I'm glad you've found it a useful starting point.

  1. On re-indexing:

    • A) In the end my use case means I only need to re-index the content via the full tinacms build process whenever changes are made to my code repository
      • I presume you also see the re-indexing process run if you execute tinacms build but also need to run that same re-indexing at other times? I'd be curious to hear either way as I'd be surprised if you don't see tinacms build trigger the indexing)
    • B) Your code looks like the right approach to trigger a re-index to me. Initially I did some experimenting and wrote some code for an API route to specifically re-index Tina system files, as well as running _indexAllContent.
      • I'm actually not using this code in my project in the end but I'll paste a version of it below in case it's useful as a reference for uploading a local file (e.g. a Tina system file) to the data layer.
      • Logan's video above also shows an example of indexing specific files by passing their file paths to an API route and calling the database function to index the content for that specific file path.
  2. GitHub rate limiting:

    • I haven't turned my mind to this much yet but I definitely think your proposed solution could work fine if you only have a couple of content sites to deal with (we need to support multiple separate content repos/sites, so at some point I'll explore options to avoid copying all the content files).
      • The GitHubBridge inherits from the FilesystemBridge and these lines fetch the Tina system files from the local file system so you could do something similar for other files if you copy them all to the location where your code is deployed as you say.
      • You could explore whether Git submodules could offer a solution for you (i.e. where you have a separate content repo and it's a git submodule within your code repo), or whether you can use GitHub Actions or API routes to trigger scripts that copy all content file updates to your code repository. That said, I'd probs suggest experimenting with a custom git provider (see below)
    • You could try creating a custom git provider that inherits from Tina's GitHubProvider and provides it's own onPut and onDelete functions:
      • Your custom functions could simply:
        • upload the file to a storage solution like S3 (or delete it in the case of onDelete)
        • call super.onPut and super.onDelete to commit the file updates to GitHub like normal
      • You could then create an S3Bridge (or whatever, depending on your storage solution of choice) and use that rather than the GitHubBridge to fetch all the latest content files when they're needed for re-indexing (rather than hitting the GitHub API and running into rate-limiting issues).
      • You could still have the GitHubBridge in case things ever fall out of sync between the GitHub content repo and your file storage (e.g. if you make a bunch of changes to the content repo locally and push them to GitHub rather than using the Tina editing UI), in which case you could probably use that to run a re-indexing process from the GitHub content repo (making sure to update your file storage as well as the data layer).

I hope that brain dump is helpful. Let us know how you go and any issues or creative solutions you stumble on! 👍

PS: Tina team - let us know if you would consider merging this PR (no worries if not), or whether you think there's a section within the documentation at https://tina.io/docs where notes on this should be added. It'd be great to have a version of the GitHubBridge code hosted somewhere publicly for community feedback and improvements - I think it (or some alternative) is a key component for unlocking self-hosting with separate content repositories so definitely happy to follow your lead on the best way to have this code visible somewhere for easy collaboration 😄

API route to re-index all Tina system files and content files

Note this is written for an API route using the new Next.js app router, in a file path of app/api/reindex/route.ts:

// app/api/reindex/route.ts
import { NextResponse, type NextRequest } from 'next/server'
import database from '@tina/database'
import schema from '@tina/__generated__/_schema.json'
import graphql from '@tina/__generated__/_graphql.json'
import lookup from '@tina/__generated__/_lookup.json'

const INDEX_KEY_FIELD_SEPARATOR = '\x1D'
const CONTENT_ROOT_PREFIX = '~'
const SUBLEVEL_OPTIONS = {
  separator: INDEX_KEY_FIELD_SEPARATOR,
  valueEncoding: 'json',
}

// export const revalidate = 0

// Example usage
// fetch('http://localhost:3000/api/reindex', {
//   method: 'POST',
//   body: JSON.stringify({
//     token: process.env.API_ROUTE_SECRET,
//   }),
// }).then(console.log)

export async function POST(request: NextRequest) {
  const resBody = await request.json()
  const isAuthorized = resBody.token === process.env.API_ROUTE_SECRET
  if (isAuthorized) {
    try {
      // NOTE: 'tinacms' is used below as it's the default contentNamespace,
      // unless you pass a custom namespace when initialising your database.
      const contentLevel =
        database.contentLevel ??
        database.rootLevel
          .sublevel('_content', SUBLEVEL_OPTIONS)
          .sublevel('tinacms', SUBLEVEL_OPTIONS)

      const contentRootLevel = contentLevel.sublevel(
        CONTENT_ROOT_PREFIX,
        SUBLEVEL_OPTIONS
      )

      // Upload all Tina system files to the data layer
      await contentRootLevel.put(
        'tina/__generated__/_graphql.json',
        graphql as any
      )
      await contentRootLevel.put(
        'tina/__generated__/_schema.json',
        schema as any
      )
      await contentRootLevel.put(
        'tina/__generated__/_lookup.json',
        lookup as any
      )

      // Or index all the content
      // (this might not re-index the Tina files, hence the code above)
      await database._indexAllContent(contentLevel as any, schema as any)

      return NextResponse.json({ success: true }, { status: 200 })
    } catch (error) {
      return NextResponse.json({ error: error }, { status: 500 })
    }
  } else {
    return NextResponse.json({ error: 'Not found' }, { status: 401 })
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants