Implement `git` submodules for specs #182

anatoly-scherbakov · 2023-10-18T15:35:52Z

Alleviate the step of manually downloading specifications for tests to use them. Instead, put specifications into a dedicated directory using Git submodules mechanism.

anatoly-scherbakov · 2023-10-18T18:15:01Z

closes #181

BigBlueHat · 2023-11-02T12:11:48Z

@anatoly-scherbakov let's give this a try. Can you expand the README in this PR also to explain what the pre-testing steps would include now (i.e. git submodule init etc.)?

It would be great to simplify testing.

Thanks for exploring this idea!
🎩

anatoly-scherbakov · 2023-11-12T16:38:02Z

@BigBlueHat @davidlehn My apologies for the delay; I think I have something to show here now.

BigBlueHat · 2023-12-05T16:32:47Z

Hey @anatoly-scherbakov, Sorry for the delay here. I continue to feel a bit squeamish about using git submodules. It's not a pattern we typically use in our other implementations, and it's a rarely used (and oft derided...) git feature. I'm also concerned about the overhead of keeping the sha's pinned accurately.

However, I did find that the jsonld.js implementation has a nice approach to this same problem using npm run commands: https://github.com/digitalbazaar/jsonld.js/blob/main/package.json#L96-L101

My thought is that we could do something similar here by providing a simple Python script that would run the git clone --depth 1 commands, and ideally also provide environment variables to use (as the JS implementation does) in case someone already has the test suites on hand but in different locations (as we often do because we work on those also).

Would you be up for taking a crack at that?

anatoly-scherbakov · 2023-12-06T19:53:40Z

@BigBlueHat thank you for your review. I understand the concern but might disagree in certain regards, let me try to elaborate on them.

Let's analyze the approach taken at jsonld.js.

    "fetch-json-ld-org-test-suite": "if [ ! -e test-suites/json-ld.org ]; then git clone --depth 1 https://github.com/json-ld/json-ld.org.git test-suites/json-ld.org; fi",

What happens if JSON-LD test suite is updated?

A developer already has a downloaded version of JSON-LD test suite locally.
- The developer is not notified about the change
- If they realize that the change has occurred, to refresh the spec version, they have to manually remove the spec directory
- Different developers will inevitably debug the implementation against different versions of the test suite, which leads to issues potentially complicated to debug
- It is hard to track which exact version of the test suite one has, and Git doesn't help with that
A developer submits their changes to jsonld-js to CI, and CI fails
- That's because the spec has been updated and CI downloaded the latest version
- First of all, the developer needs to realize that had happened
- Then, they have to download the latest spec and debug what has changed,
- Which wasn't in the original scope of their pull request.

I feel these cases can be very annoying and might hurt developer productivity.

Update of the spec version is not tied to an explicit human decision, which goes against normal practice for dependencies in many environments.

For instance, with Python poetry, you have to issue poetry update to update your dependencies; otherwise, their versions are explicitly pinned in poetry.lock. The same applies to cargo (Rust dependency manager), to npm (with its package.lock), and others.

Git submodules essentially provide the same level of control over the version of dependencies. It is easy to provide a make command which will upgrade dependencies, say

make upgrade-submodules

I'll add it in a moment in another commit. This will upgrade each submodule to its latest version; afterwards, the developer can run tests, make sure everything is 🟢, and submit to CI.

As to the popularity of git submodules, — quick search on Github returns a few popular projects relying upon them: pytorch, ghc, draw.io, mono, ceph, obs-studio, micropython, qemu and more.

Let me know what you think!

anatoly-scherbakov · 2023-12-06T21:34:13Z

@BigBlueHat

P. S. To address the concern about testing pyld, this can be done by:

python tests/runtests.py ~/projects/json-ld-api

…as documented in README.rst. I don't think git submodules would be an obstacle for local development in this fashion.

BigBlueHat · 2024-01-31T18:50:13Z

@anatoly-scherbakov coming back to this again--sorry for the wait.

The experience of using the submodules was fine--not much different than the experience of using the approach we took in jsonld.js, so I'm fine with it.

@davidlehn unless you're diametrically opposed to this, I'd like to get it merged sooner than later.

BigBlueHat

Small tweak which isn't a blocker--and could be done later--so approving.

Makefile

tests/runtests.py

anatoly-scherbakov · 2024-02-01T17:50:15Z

Thanks for the review @BigBlueHat! I am writing an issue about the Makefile shortly but I do not believe I have the rights to merge this PR.

This might be also related to me being unable to create a PR in this repository which has another PR as its base.

davidlehn · 2024-02-06T01:38:09Z

.gitmodules

@@ -0,0 +1,11 @@
+[submodule "specifications/json-ld-api"]


These are the specs, but since their sole purpose here is for testing, could this be named test-suites/? That's what the scripts in jsonld.js use.

Thanks for reviewing!

I can rename those. A few things concern me a bit though:

Each repository we're checking out contains not just tests but the whole specification, which might be confusing and violate the principle of least surprise;

We will likely add more tests to the tests directory which will not be using the specifications, and the existence of both tests and test-suites will be confusing;

At some point, we might want to use the specifications for other purposes than tests, for instance, to aid documentation generation, or something.

What do you think?

I prefer the specifications/ folder name, as I do hope/expect we'll get more unit tests added to tests/ at some point, and test-suites/ would certainly be confusing.

davidlehn · 2024-02-06T01:40:56Z

Makefile

@@ -0,0 +1,2 @@
+upgrade-submodules:
+	git submodule update --remote --init --recursive


How do you remove all the checked out dirs without leaving git thinking there are unstaged changes? Put another way, how do you undo this command?

If you want to undo the version upgrade of the submodules, you can do:

git reset --hard

If you've modified the submodules locally, after having them upgraded (for debugging purposes perhaps) then you do:

git submodule foreach git reset --hard

README.rst

tests/runtests.py

davidlehn · 2024-02-06T01:48:27Z

tests/runtests.py

 ]

+
+def default_test_targets() -> List[Path]:


How about changing this so it works something like the jsonld.js test runner and checks for the local test suite dir, then for the sibling dir if not found. That allows for alternate workflows.

It didn't consider missing test suites an error before. I'm not sure what should happen for that. Never seemed a problem in practice.

Do you believe there is a use case for it? Probably the main goal of turning specifications into submodules is to automate the connection to them, standardizing it across the developers' clones of pyld repo.

Having the specifications cloned into the root directory of the repo, for instance, will require .gitignore rules (or it's way too easy to commit them, which is likely not what one wants).

We can later add an error message to this function. If the default test locations are used, and if tests are missing, then probably something is wrong. For instance, the developer forgot to check out the submodules. Their tests will be incorrectly green locally, and then, after a push, failing CI will pose an unpleasant surprise.

davidlehn · 2024-02-06T02:18:19Z

If this PR is going to go all-in, then consider updating docs to handle what #157 was doing to address #156.

davidlehn · 2024-02-06T03:22:49Z

I don't necessarily disagree with the reasoning to use submodules. As a counter point, pyld and jsonld.js are over a decade old and have gotten by without using that feature. It's a matter of tradeoffs between using different commands and dealing with different problems. I'm not sure what the best approach is. Certainly better docs and tooling would help in whatever approach is used.
Adding this feature forces the maintainers to learn to use it. Help us (me) out with a few more snippets of docs perhaps in the README and/or CONTRIBUTING. Maybe even more script/makefile targets to run. Specifically, what do we (I) need to type to keep the test suites synced up all the time.
I have concerns about forgetting to update the test suite repo refs and not noticing newly added tests are failing. I can imagine complex processes to address this, but what is the theory here? I guess the idea is to tradeoff possibly running with updated tests unintentionally to having outdated tests unintentionally.
I put in some comments, partly directed towards making the former workflow possible with minimal hassle. I found that worked well for me. Especially when I was working on js and python at the same time and explicitly wanted to use shared sibling test suites. That's a particular use case for sure, but a real one.
I suppose I'm willing to try this change if pressed. To me it seems either method is fine for any potential contributors. There is almost no input here, so I'm not sure others feel strongly at all.

davidlehn · 2024-02-06T03:38:14Z

@anatoly-scherbakov Could you also change the commit message with the fancy "+" and one with ellipses, arrow, and math symbol to use simple ASCII? Nicer to use chars people can easily type. Feel free to argue. :-)

anatoly-scherbakov · 2024-02-06T13:36:48Z

@davidlehn thanks again for the review!

I have applied a few requested changes.

Help us (me) out with a few more snippets of docs perhaps in the README and/or CONTRIBUTING. Maybe even more script/makefile targets to run.

Yes! I feel that less documentation and more automation is for the better :)

I have concerns about forgetting to update the test suite repo refs and not noticing newly added tests are failing.
Indeed, this is another side of the coin. Most dependency management tools (poetry, npm, ...) seem to be okay with that, staying on the safe side to make builds predictable.

Tools like Dependabot, which post notifications about out-of-date dependencies and even create PRs automatically, are often leveraged. I think I can easily imagine a rather simple scheduled GitHub actions workflow that will update the specs submodules every week, run the tests, and create a PR with that upgrade. Do you think we should create an issue for that?

I put in some comments, partly directed towards making the former workflow possible with minimal hassle. I found that worked well for me. Especially when I was working on js and python at the same time and explicitly wanted to use shared sibling test suites. That's a particular use case for sure, but a real one.

I see. The runtests.py script allows to specify the path to the specification. Say, you have a development version of a spec somewhere on your local disk; you can easily run the tests from that version using this command.

Could you also change the commit message with the fancy "+" and one with ellipses, arrow, and math symbol to use simple ASCII? Nicer to use chars people can easily type. Feel free to argue. :-)

No argument here: I just like those fancy Unicode characters, especially putting them into commit messages 🙂 , and have setup my Linux to easily type them, but this is purely a matter of taste. Removed ✅

BigBlueHat · 2024-02-06T14:51:20Z

@davidlehn the key thing I'm beginning to appreciate about the submodule approach is that it will provide git stored evidence of what tests were run--based on the sha stored in .gitmodules. We don't have that now if developers typically just grab the latest revision--or their own fork.

It should also provide a means to send updates to the submodule repos--though the porcelain for that feels more like plumbing, so some more commands (as you note) would be helpful.

BigBlueHat · 2024-02-06T14:52:22Z

@anatoly-scherbakov can you take rdf-canon out and go back to the old tests. We should do the rdf-canon change in a separate PR...and I believe it'll need more code integration so as not to be completely skipped.

anatoly-scherbakov · 2024-02-06T17:54:28Z

@BigBlueHat the previous address was https://github.com/json-ld/normalization/tests. Right now, https://github.com/json-ld/normalization redirects to https://github.com/w3c-ccg/rdf-dataset-canonicalization. Is that the link we should use?

Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>

Co-authored-by: David I. Lehn <dil@lehn.org>

anatoly-scherbakov · 2024-02-06T18:30:20Z

I've rebased this branch on top of #185.

BigBlueHat · 2024-02-06T21:13:42Z

@BigBlueHat the previous address was https://github.com/json-ld/normalization/tests. Right now, https://github.com/json-ld/normalization redirects to https://github.com/w3c-ccg/rdf-dataset-canonicalization. Is that the link we should use?

This should be the correct location of the old tests suites:
https://github.com/w3c-ccg/rdf-dataset-canonicalization/tree/main/tests

So, still needs an updated URL in this (or another) PR. The rds-canon ones, though, introduce a whole new host of concerns.

Sorry I didn't see that earlier!

anatoly-scherbakov · 2024-02-06T21:43:24Z

Added rdf-dataset-canonicalization spec submodule,
Reconfigured the tests to use it and to ignore rdf-canon submodule,
Added make test command.

Should we replace rdf-canon references everywhere in the docs as well, given that we're going to get them back anyway later?

BigBlueHat · 2024-02-06T22:33:38Z

Added rdf-dataset-canonicalization spec submodule,

Reconfigured the tests to use it and to ignore rdf-canon submodule,

Added make test command.

Should we replace rdf-canon references everywhere in the docs as well, given that we're going to get them back anyway later?

Yeah...I think we need to add rdf-canon stuff as a separate PR. Otherwise, we're pulling all this code and content down, and then not using it at all...which would be very confusing.

anatoly-scherbakov marked this pull request as ready for review November 12, 2023 16:37

anatoly-scherbakov mentioned this pull request Nov 12, 2023

Switch from travis-ci to GitHub workflow. #185

Open

BigBlueHat mentioned this pull request Nov 29, 2023

Handle keys dropped from JSON-LD document during expansion (closes #50) #186

Closed

BigBlueHat requested review from BigBlueHat and davidlehn January 31, 2024 18:50

BigBlueHat approved these changes Jan 31, 2024

View reviewed changes

Makefile Show resolved Hide resolved

tests/runtests.py Outdated Show resolved Hide resolved

This was referenced Jan 31, 2024

Switch from normalization to rdf-canon test suite #190

Open

change 2 if statements #189

Merged

anatoly-scherbakov mentioned this pull request Feb 3, 2024

Use Makefile to improve developer experience #191

Open

BigBlueHat added this to the v2.0.4 milestone Feb 5, 2024

BigBlueHat mentioned this pull request Feb 5, 2024

Clarify documentation on runing Earl reports #157

Closed

davidlehn reviewed Feb 6, 2024

View reviewed changes

anatoly-scherbakov force-pushed the 181-git-submodules branch 2 times, most recently from 5798f0b to 440a251 Compare February 6, 2024 13:36

BigBlueHat modified the milestones: v2.0.4, v2.0.5 Feb 6, 2024

anatoly-scherbakov changed the base branch from master to add-github-workflow February 6, 2024 17:56

Echsecutor and others added 9 commits February 6, 2024 22:28

fixing dead link in comment

9ee13f0

fixing another dead link

b7086a5

Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>

Add submodules

342e7a2

Update .gitignore

5732daf

Use spec tests from git modules as default test targets

1fc8668

Note about Git submodules in README.rst

d984f26

Add make upgrade-submodules command

592168c

Upgrade submodules

e3de074

Remove trailing / at tests/runtests.py

c195dcd

Co-authored-by: David I. Lehn <dil@lehn.org>

anatoly-scherbakov force-pushed the 181-git-submodules branch from 440a251 to c195dcd Compare February 6, 2024 18:28

anatoly-scherbakov added 4 commits February 7, 2024 01:19

Add rdf-dataset-canonicalization specification

4a813dd

Upgrade specifications

05df321

Use previous version of RDF canonicalization spec

60654ae

Add make test command

d2f7920

Delete rdf-canon for now

0435f6a

anatoly-scherbakov requested review from davidlehn and BigBlueHat February 8, 2024 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `git` submodules for specs #182

Implement `git` submodules for specs #182

anatoly-scherbakov commented Oct 18, 2023 •

edited

anatoly-scherbakov commented Oct 18, 2023

BigBlueHat commented Nov 2, 2023

anatoly-scherbakov commented Nov 12, 2023

BigBlueHat commented Dec 5, 2023

anatoly-scherbakov commented Dec 6, 2023

anatoly-scherbakov commented Dec 6, 2023

BigBlueHat commented Jan 31, 2024

BigBlueHat left a comment

anatoly-scherbakov commented Feb 1, 2024

davidlehn Feb 6, 2024

anatoly-scherbakov Feb 6, 2024

BigBlueHat Feb 8, 2024

davidlehn Feb 6, 2024

anatoly-scherbakov Feb 6, 2024

davidlehn Feb 6, 2024

davidlehn Feb 6, 2024

anatoly-scherbakov Feb 6, 2024

davidlehn commented Feb 6, 2024

davidlehn commented Feb 6, 2024

davidlehn commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

		@@ -0,0 +1,2 @@
		upgrade-submodules:
		git submodule update --remote --init --recursive

Implement git submodules for specs #182

Are you sure you want to change the base?

Implement git submodules for specs #182

Conversation

anatoly-scherbakov commented Oct 18, 2023 • edited

anatoly-scherbakov commented Oct 18, 2023

BigBlueHat commented Nov 2, 2023

anatoly-scherbakov commented Nov 12, 2023

BigBlueHat commented Dec 5, 2023

anatoly-scherbakov commented Dec 6, 2023

anatoly-scherbakov commented Dec 6, 2023

BigBlueHat commented Jan 31, 2024

BigBlueHat left a comment

Choose a reason for hiding this comment

anatoly-scherbakov commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidlehn commented Feb 6, 2024

davidlehn commented Feb 6, 2024

davidlehn commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

anatoly-scherbakov commented Feb 6, 2024

BigBlueHat commented Feb 6, 2024

Implement `git` submodules for specs #182

Implement `git` submodules for specs #182

anatoly-scherbakov commented Oct 18, 2023 •

edited