Add commands for automatically modifying configs #12020

polm · 2022-12-23T10:08:28Z

Description

This continues work started in explosion/projects#147, which provides features for automatically manipulating pipelines and configs. The functions included are:

merge: combine components from two pipelines and handle listeners
use_transformer: use transformer as feature source
use_tok2vec: use CNN tok2vec as feature source
resume: make a version of a config for resuming training

Currently these are all grouped under a new spacy configure command. That may not be the best place for them; in particular, merge may belong elsewhere, since it outputs a pipeline rather than a config.

The current state of the PR is that the commands run, but there's only one small test, and docs haven't been written yet. Docs can be started but will depend somewhat on how the naming issues work out.

Types of change

enhancement

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

This continues work started in explosion/projects#147, which provides features for automatically manipulating pipelines and configs. The functions included are: - merge: combine components from two pipelines and handle listeners - use_transformer: use transformer as feature source - use_tok2vec: use CNN tok2vec as feature source - resume: make a version of a config for resuming training Currently these are all grouped under a new `spacy configure` command. That may not be the best place for them; in particular, `merge` may belong elsewhere, since it outputs a pipeline rather than a config. The current state of the PR is that the commands run, but there's only one small test, and docs haven't been written yet. Docs can be started but will depend somewhat on how the naming issues work out.

Maybe this will fix the CI issue?

This reverts commit be95ef5.

Adding the transformer component requires spacy-transformers, which isn't present in the normal test env.

This also change the `output_file` arg to match other commands.

This removes one old print statement and some old TODOs. Some TODOs are left as future work.

polm · 2023-01-13T05:32:08Z

This might require more adjustment - for example, maybe merge should be split out into a separate command - but it is substantially complete and ready for review. Any feedback on how to make the design clearer would be welcome.

rmitsch

Thanks for this, I think this functionality can be super handy! I just did a superficial pass now and will continue reviewing at a later point.

spacy/cli/configure.py

website/docs/api/cli.mdx

rmitsch · 2023-02-08T15:08:42Z

Currently these are all grouped under a new spacy configure command. That may not be the best place for them; in particular, merge may belong elsewhere, since it outputs a pipeline rather than a config.

I agree - having merge under spacy configure would be confusing. I'd prefer it as a separate top-level command.

polm · 2023-02-09T06:18:13Z

Thanks for the feedback, I moved merge to a separate top-level command.

rmitsch

Apart from the command naming and some smaller stuff this looks good, but I'd definitely get feedback from a third party before moving forward.

spacy/cli/configure.py

spacy/cli/merge.py

website/docs/api/cli.mdx

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

svlandeg

There's a lot of features in this PR and ideally we would have split this up into separate PRs to make reviewing easier and the commits in the history more atomic. I guess we can keep it as is for now, but we'll want to review this carefully as there's a lot going on :-)

adrianeboyd · 2023-08-03T06:42:15Z

spacy/cli/configure.py

+            "pooling": {"@layers": "reduce_mean.v1"},
+        }
+        nlp.config["components"][listener]["model"]["tok2vec"] = listener_config
+


This would also need to update the [training] block.

adrianeboyd · 2023-08-03T06:48:18Z

spacy/cli/configure.py

+            "upstream": "tok2vec",
+        }
+        nlp.config["components"][listener]["model"]["tok2vec"] = listener_config
+


This may also need to update the [training] block. (I know that tok2vec->transformer doesn't work. I'm not 100% sure it doesn't work the other way around, but probably the tok2vec defaults are better.)

adrianeboyd · 2023-08-03T06:49:48Z

spacy/cli/configure.py

+TOK2VEC_ARCHS = [
+    ("spacy", "Tok2Vec"),
+    ("spacy", "HashEmbedCNN"),
+    ("spacy-transformers", "TransformerModel"),
+]
+# These are the listeners.
+LISTENER_ARCHS = [
+    ("spacy", "Tok2VecListener"),
+    ("spacy-transformers", "TransformerListener"),
+]


I wonder if there's a more general way to determine these lists?

polm added enhancement Feature requests and improvements feat / pipeline Feature: Processing pipeline and components feat / cli Feature: Command-line interface feat / config Feature: Training config labels Dec 23, 2022

polm mentioned this pull request Dec 23, 2022

Add a script for customizing pipelines explosion/projects#147

Closed

polm added 8 commits December 23, 2022 19:31

Fix import

ab2773e

Fix types

f3a928c

Add use_transformer test

836fd87

Add HashEmbedCNN to list of tok2vec architectures

dab7894

TODO REVERT Try turning off batching

be95ef5

Maybe this will fix the CI issue?

Revert "TODO REVERT Try turning off batching"

a749d2d

This reverts commit be95ef5.

Test use_tok2vec, not use_transformer

10bbb01

Adding the transformer component requires spacy-transformers, which isn't present in the normal test env.

Add test for merging pipelines

2791f0b

polm closed this Dec 28, 2022

polm reopened this Dec 28, 2022

polm added 5 commits January 11, 2023 16:06

Add docs for configure command

f2bbab4

This also change the `output_file` arg to match other commands.

Use new-style tags in docs

10adbcb

Merge branch 'master' into feature/config-manipulators

d7192c4

Fix new-style header

4bad296

Cleanup

3fe723c

This removes one old print statement and some old TODOs. Some TODOs are left as future work.

polm marked this pull request as ready for review January 13, 2023 05:31

rmitsch reviewed Jan 30, 2023

View reviewed changes

polm added 2 commits January 31, 2023 12:46

Add types to _deep_get

9d3e3e6

Add types to everything

9d0ae24

Move merge to independent command

03a0c2b

rmitsch reviewed Feb 9, 2023

View reviewed changes

polm and others added 8 commits February 10, 2023 14:17

Apply suggestions from code review

a76fd0d

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

Code reorganization

c0a3e9a

Add links to docs in docstrings

2524d06

rename to _make_unique_pipe_names

b9537ec

Update from code review

4279c73

Change back to short names

81276f2

Merge branch 'master' into feature/config-manipulators

5019d76

Fix function name in tests

e668c2c

svlandeg reviewed Feb 23, 2023

View reviewed changes

adrianeboyd reviewed Aug 3, 2023

View reviewed changes

svlandeg changed the base branch from master to main January 29, 2024 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add commands for automatically modifying configs #12020

Add commands for automatically modifying configs #12020

polm commented Dec 23, 2022 •

edited

polm commented Jan 13, 2023

rmitsch left a comment

rmitsch commented Feb 8, 2023 •

edited

polm commented Feb 9, 2023

rmitsch left a comment

svlandeg left a comment •

edited

adrianeboyd Aug 3, 2023

adrianeboyd Aug 3, 2023

adrianeboyd Aug 3, 2023

Add commands for automatically modifying configs #12020

Are you sure you want to change the base?

Add commands for automatically modifying configs #12020

Conversation

polm commented Dec 23, 2022 • edited

Description

Types of change

Checklist

polm commented Jan 13, 2023

rmitsch left a comment

Choose a reason for hiding this comment

rmitsch commented Feb 8, 2023 • edited

polm commented Feb 9, 2023

rmitsch left a comment

Choose a reason for hiding this comment

svlandeg left a comment • edited

Choose a reason for hiding this comment

adrianeboyd Aug 3, 2023

Choose a reason for hiding this comment

adrianeboyd Aug 3, 2023

Choose a reason for hiding this comment

adrianeboyd Aug 3, 2023

Choose a reason for hiding this comment

polm commented Dec 23, 2022 •

edited

rmitsch commented Feb 8, 2023 •

edited

svlandeg left a comment •

edited