Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split monorepo up into several repos #1724

Closed
1 of 3 tasks
aslakhellesoy opened this issue Sep 2, 2021 · 26 comments
Closed
1 of 3 tasks

Split monorepo up into several repos #1724

aslakhellesoy opened this issue Sep 2, 2021 · 26 comments
Labels
🔧 build Related to build / release process

Comments

@aslakhellesoy
Copy link
Contributor

aslakhellesoy commented Sep 2, 2021

Is your feature request related to a problem? Please describe.

There are several problems with the cucumber/common monorepo:

  • Building all the 50+ packages in the repo takes ~1h (both locally and in CI)
  • The build system is complex, brittle and hard to maintain
  • Newcomers find the size of the repo and complexity of the build process intimidating and hard to navigate

Describe the solution you'd like

I want to split cucumber/common up into multiple repos:

  • Polyglot repos (with java, javascript, ruby etc subdirectories)
    • cucumber/cucumber-expressions
    • cucumber/tag-expressions
    • cucumber/gherkin
    • etc.
  • Single implementation repos
    • cucumber/language-service
    • cucumber/language-server
    • cucumber/vscode
    • cucumber/monaco
    • cucumber/react
    • etc.

The current build system has complex functionality that we'd have to replace:

  • Consistent build configs
    • Extract to cucumber/eslint, cucumber/tsconfig etc. git repos.
  • Release scripts (Update version numbers in descriptors/changelogs, package and publish)
    • Extract to bash scripts in a cucumber/release git repo and use from Docker when doing a release
  • Update version numbers in dependent libraries after a release
    • Rely on WhiteSource Renovate to do this

Also see discussion in Slack

Describe alternatives you've considered

We could probably push ahead with #1720 and make the current monorepo serial build run in a few seconds (by leveraging a cache in the cloud), but the build process would still be complex and brittle. Newcomers would still be intimidated by size and complexity of this huge repo.

Additional context

In 2015 the Cucumber implementations had diverged and behaved inconsistently. Each release made them more inconsistent. To mitigate this we decided to bring all the Gherkin implementations into one repository, using a shared acceptance test suite.

This worked well, so we continued with the same approach for new libraries such as Cucumber Expressions and Tag Expressions - in the same repo.

Building and in particular releasing libraries in 10 or so languages is complicated, so we built an "orchestration" build system with Make that makes the build process consistent across the increasing number of libraries.

Fast forward six years, and we have a build system with fangs, tentacles and worts. The build system wasn't designed with parallelism in mind, which is why it takes 1h.

TODO

  • cucumber-expressions
  • tag-expressions
  • ...
@aslakhellesoy aslakhellesoy added the 🔧 build Related to build / release process label Sep 2, 2021
@mpkorstanje
Copy link
Contributor

I'll move datatable over to cucumber-jvm.

@aurelien-reeves
Copy link
Contributor

Maybe we could do that in conjunction with #1614?

@aslakhellesoy
Copy link
Contributor Author

While it is possible to bring commit history when moving from one repo to another, I suggest we don't do it because it's tedious to do. People who need to look at the history will find it in this repo.

@jamietanna
Copy link
Contributor

We can retain the history for the new repos, for just the subtree that's being imported. I'd much prefer, as a consumer of Cucumber, to be able to view the history in the new repo, rather than jumping around.

I've done it before using Option 3 in https://stackoverflow.com/a/30386041 and it works nicely.

@aurelien-reeves
Copy link
Contributor

Thanks for the info @jamietanna!

If we have the possibility to keep the history, that would be great indeed!

@aslakhellesoy
Copy link
Contributor Author

Ok, I'm sure we can do that :-)

@aslakhellesoy
Copy link
Contributor Author

I'm proposing we start by creating the following new repositories:

This will hollow out about half of the monorepo. Eventually I would like to have everything moved out so we can retire the monorepo, but let's start with this.

@aurelien-reeves
Copy link
Contributor

aurelien-reeves commented Sep 9, 2021

cucumber-expressions/test-data may be tricky to move.
Do we have a plan for shared test-data, CCK, and related?

Beside that, looks good 👍

@aslakhellesoy
Copy link
Contributor Author

cucumber-expressions/test-data may be tricky to move

Why? I'm proposing we move it along with all the implementations so the directory structure will be like this:

.
├── go
├── java
├── javascript
├── ruby
└── testdata

@aurelien-reeves
Copy link
Contributor

Oh, testdata here are not synced from another package 😅

Sorry for that. So yes, looks good 👍

@aslakhellesoy
Copy link
Contributor Author

Do we have a plan for shared test-data, CCK, and related?

We have two kinds:

  • Test data that is not used outside a module, such as cucumber-expressions/testdata
    • This will just be moved along with the source code
  • Test data that is used outside the module where it lives, such as gherkin/testdata and compatibility-kit/javascript/features/**/.ndjson
    • When these move to new cucumber/compatibility-kit and cucumber/gherkin repositories, other modules that need them could access the files via native language modules (maven, npm, gem etc - assuming we bundle them inside). Alternatively the builds could download them from GitHub with a git submodule or fetching a tarball.

@mpkorstanje
Copy link
Contributor

@aslakhellesoy I'll move datatable to cucumber-jvm.

@mpkorstanje
Copy link
Contributor

git remote add common git@github.com:cucumber/common.git
git fetch common
git checkout -b merge-datatable common/main 
git filter-branch --subdirectory-filter datatable
git merge origin/main --allow-unrelated-histories 
git push 

Looks like this worked for me. But notice the big disclaimer. Probably good to follow it.

@aurelien-reeves
Copy link
Contributor

git remote add common git@github.com:cucumber/common.git
git fetch common
git checkout -b merge-datatable common/main 
git filter-branch --subdirectory-filter datatable
git merge origin/main --allow-unrelated-histories 
git push 

Looks like this worked for me. But notice the big disclaimer. Probably good to follow it.

Which disclaimer?

@aslakhellesoy
Copy link
Contributor Author

@mattwynne and I experimented a bit today, trying to create a new (local) repo for cucumber-expressions. We used this:

brew install git-filter-repo
mkdir cucumber-expressions
cd cucumber-expressions
git init
git remote add common git@github.com:cucumber/common.git
git fetch common
git checkout -b tmp-migrate common/main
git filter-repo --subdirectory-filter cucumber-expressions --force
git branch -m main

We also talked about work to do after that:

Cleanup

  • Move all cucumber-expressions/vX.Y.Z tags to vX.Y.Z
  • Move all cucumber-expressions-vX.Y.Z tags to vX.Y.Z
  • Move all cucumber-expressions/go/vX.Y.Z tags to go/vX.Y.Z
  • Delete all tags that don't match /^v\d/ or /^go\/v\d/

Not doing

  • Fix issue/pr references in CHANGELOG.md

Elsewhere

  • Delete directory in monorepo
  • Send PRs to all known go repos depending on this module

...to be continued...

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Sep 19, 2021

I wrote a gist based on the experiments @mattwynne and I did a couple of days ago: https://gist.github.com/aslakhellesoy/3cb73d9b69c28b497710b78baf0d3ec5

It seems to work well creating a new cucumber-expressions polyglot repo:

curl -s https://gist.githubusercontent.com/aslakhellesoy/3cb73d9b69c28b497710b78baf0d3ec5/raw/8ff5651126ae6eb7ae5240bcac39ec01744a6cc5/make-polyglot-repo.sh \ | 
  bash /dev/stdin cucumber-expressions

Any suggestions/feedback before we push this as a new repo and remove cucumber-expressions from cucumber/common?

@aurelien-reeves
Copy link
Contributor

As far as I can tell, it looks good 👌

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Sep 20, 2021

I have created https://github.com/cucumber/cucumber-expressions

Here are some more notes on what needs to be done to finish the work (we can reuse this checklist for other moves)

Push new repo

After creating a local repo:

  • Run git diff and correct unwanted changes to CHANGELOG.md
  • Create a new repo on GitHub
    • Check the Renovate button
  • Push the local repo
    • git push --tags

Configure Renovate

Configure Repo

Set up CI

  • Add GitHub Action
    • Use existing Makefile
    • Or try setting up an Earthfile instead

Cleanup

  • Archive all the read-only mirrors
  • Send a PR to cucumber/common where the moved repo is removed:
    • Delete the directory
    • Update root Makefile
    • Update CircleCI config

Migrate documentation

Make a release

This will require some more work since the release scripts are not migrated over from the common repo, and we need to rethink how it's done. It should be simpler! /cc @mattwynne

@aurelien-reeves
Copy link
Contributor

For the CI, I did not use any kinda makefiles. I've directly written git workflows.

That seems fine for cucumber-expressions as all the tests are easily executed from commands like npm test, bundle exec rspec, go test ./... and mvn test.

And we are actually working on some release process throw git workflow too.

So, maybe the Makefile could be greatly simplified to be used to run the docker container, and eventually some global clean tasks?

@aurelien-reeves
Copy link
Contributor

On debian linux, we had to tweak the script (https://gist.github.com/aslakhellesoy/3cb73d9b69c28b497710b78baf0d3ec5) a little bit:

# Delete all other tags
- git tag | grep --invert-match -E '^go/v\d|^v\d' | \
+ git tag | grep --invert-match -E '^go/v[0-9]|^v[0-9]' | \
  xargs -n1 git tag -d
# Modify CHANGELOG.md links. Remove the '' if not on MacOS. 
- sed -i '' "s|${name}-||g" CHANGELOG.md
- sed -i '' "s|${name}/||g" CHANGELOG.md
- sed -i '' "s|https://github.com/cucumber/common/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md
- sed -i '' "s|https://github.com/cucumber/cucumber/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md
+ sed -i "s|${name}-||g" CHANGELOG.md
+ sed -i "s|${name}/||g" CHANGELOG.md
+ sed -i "s|https://github.com/cucumber/common/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md
+ sed -i "s|https://github.com/cucumber/cucumber/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md

We also had to make sure to use git version >= 2.22

@mattwynne
Copy link
Member

Create-meta is done!

@mattwynne
Copy link
Member

@aslakhellesoy any thoughts on which package to tackle next? @aurelien-reeves and I discussed this a bit today on voice, but we don't have a clear plan as yet.

It feels like the CCK, gherkin and messages are the big ones that remain.

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Nov 15, 2021

I propose gherkin, then messages, then cck.

When we move out Gherkin we should get rid of make too, which means replacing the make based tests with unit testing tool tests. These tests will be much faster (no executable to
launch for each doc), and also easier for contributors to run (they’ll use the conventional testing tool).

The cucumber-expressions and create-meta repos already use this technique, have a look at that. The gherkin/elixir tests already use this approach.

@davidjgoss
Copy link
Contributor

davidjgoss commented Nov 18, 2021

Have we discussed how to deal with formatters in this split yet?

I was thinking about splitting out html-formatter. I think we've previously discussed the idea of having formatters (or at least terminal-focused formatters) together, but I think the html one is definitely an outlier in that one implementation (javascript) is depended on by the others which is not the normal pattern, so it should perhaps be its own repo.

It does have a dependency on @cucumber/react but I think the API surface used by the formatter doesn't change very often so that should be okay. What do we think? We can also look at switching from webpack to esbuild while we're at it :)

@mattwynne
Copy link
Member

@davidjgoss yeah I hadn't thought about it too hard yet, but I agree the the html formatter is definitely something that could do with standing alone. I wouldn't be averse to moving the @cucumber/react module along with it if that will make it easier to change.

@mattwynne
Copy link
Member

Let's use this project to track progress from now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 build Related to build / release process
Projects
Status: Implemented
Cucumber Open
Implemented
Development

No branches or pull requests

6 participants