Add adapter #1545

xinyual · 2021-03-31T12:30:46Z

Description

add adapter and bias-finetune to finetune-script

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

leezu · 2021-04-02T14:58:17Z

scripts/classification/train_classification.py

    backbone = Model.from_cfg(cfg)
    # Load local backbone parameters if backbone_path provided.
    # Otherwise, download backbone parameters from gluon zoo.

    backbone_params_path = backbone_path if backbone_path else download_params_path
    if checkpoint_path is None:
-        backbone.load_parameters(backbone_params_path, ignore_extra=True,
+        backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True,


Would the following be safer?

Suggested change

backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True,

backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=(method == 'adapter'),

leezu · 2021-04-02T15:04:33Z

src/gluonnlp/layers.py

@@ -28,6 +28,8 @@
 import numpy as _np
 from typing import Union, Optional, List, Dict
 from .op import relative_position_bucket
+#from .attention_cell import MultiHeadAttentionCell


This would be a circular import, as attention_cell also imports layers.

gluon-nlp/src/gluonnlp/attention_cell.py

Lines 25 to 27 in 65c3047

from .layers import SinusoidalPositionalEmbedding,\

BucketPositionalEmbedding,\

LearnedPositionalEmbedding

To solve this, two options are to either move SinusoidalPositionalEmbedding,
BucketPositionalEmbedding,
LearnedPositionalEmbedding out of the layers.py into a new file and change the import in attention_cell. Or you can move AdapterModule into a new file. You can also come up with other solutions

leezu · 2021-04-02T15:06:12Z

scripts/classification/train_classification.py

+    parser.add_argument('--method', type=str, default='full', choices=['full', 'bias', 'subbias', 'adapter'],
+                        help='different finetune method')


Would you like to edit the README file to include results for (at least some of) the different choices (and references to the papers)?

codecov · 2021-04-12T06:19:00Z

Codecov Report

Merging #1545 (43327de) into master (1326258) will decrease coverage by 0.71%.
The diff coverage is 78.94%.

❗ Current head 43327de differs from pull request most recent head e25bcb3. Consider uploading reports for the commit e25bcb3 to get more accurate results

@@            Coverage Diff             @@
##           master    #1545      +/-   ##
==========================================
- Coverage   82.20%   81.48%   -0.72%     
==========================================
  Files          68       68              
  Lines        8540     8432     -108     
==========================================
- Hits         7020     6871     -149     
- Misses       1520     1561      +41

Impacted Files	Coverage Δ
src/gluonnlp/layers.py	`86.72% <ø> (ø)`
src/gluonnlp/models/transformer.py	`97.89% <70.00%> (-0.64%)`	⬇️
src/gluonnlp/models/bert.py	`94.42% <88.88%> (-0.39%)`	⬇️
conftest.py	`76.31% <0.00%> (-9.94%)`	⬇️
src/gluonnlp/data/loading.py	`76.55% <0.00%> (-7.39%)`	⬇️
src/gluonnlp/data/filtering.py	`78.26% <0.00%> (-4.35%)`	⬇️
src/gluonnlp/utils/lazy_imports.py	`58.42% <0.00%> (-2.25%)`	⬇️
src/gluonnlp/data/tokenizers/yttm.py	`81.73% <0.00%> (-1.02%)`	⬇️
src/gluonnlp/torch/models/transformer.py	`27.23% <0.00%> (-0.92%)`	⬇️
src/gluonnlp/data/tokenizers/spacy.py	`65.33% <0.00%> (-0.91%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ff0519...e25bcb3. Read the comment docs.

leezu · 2021-04-29T23:36:20Z

src/gluonnlp/models/bert.py

@@ -626,39 +623,33 @@ def layout(self):
    def forward(self, inputs, token_types, valid_length,
                       masked_positions):
        """Getting the scores of the masked positions.
-


This line is required.

Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on the same line as the opening quotes or on the next line. The entire docstring is indented the same as the quotes at its first line (see example below).

https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings

leezu · 2021-04-29T23:45:10Z

src/gluonnlp/adapters.py

+        out = self.down_proj(data)
+        out = self.activate(out)
+        out = self.up_proj(out)
+        return out + residual


You may not need a separate argument "residual" here. The residual connection described in the paper refers to doing return out + data, where data is the original input before down projection, activation function and up projection.

http://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf

leezu · 2021-04-29T23:46:09Z

src/gluonnlp/adapters.py

+        out = self.ffn_2(out)
+        out = self.dropout_layer(out)
+        if self._use_adapter and 'location_1' in self._adapter_config:
+            out = self.adapter_layer_ffn(out, residual)


Based on your implementation of BasicAdapter, you'd need to call this layer as self.adapter_layer_ffn(out, out).

xinyual · 2021-05-26T13:47:22Z

src/gluonnlp/adapters.py

+    def forward(self, query, key, value):
+        #query bs, length, unit
+        #key bs, length, num_adapters, unit
+


Hi xingjian @sxjscience ,could you please check these lines with the utilization of einsum? I show the original implementation in comment. And if you want to look, the purpose of these lines are similiar to https://github.com/Adapter-Hub/adapter-transformers/blob/0fe1c19f601b7785273e173d30a9392e407823d1/src/transformers/adapters/modeling.py#L211 from line 211 to line 223

It looks good to me. One improvement is that there is no need to transpose anymore. You can rely on einsum to fuse these operations in a single op.

Ok. Thanks!

xinyual requested a review from a team as a code owner March 31, 2021 12:30

Ubuntu added 4 commits April 2, 2021 02:48

add adapter

96b514a

add adapter

b1a2bed

add adpter

1e51262

add adapter

65c3047

xinyual force-pushed the add_adapter branch from 37955e3 to 65c3047 Compare April 2, 2021 02:50

leezu reviewed Apr 2, 2021

View reviewed changes

Ubuntu added 9 commits April 9, 2021 10:59

change adapter config

867a41c

change adadpter input

17ab2d7

change adapter

75095bd

add adapter

1189a51

change interface

cf6c058

remove useless code

89ea09d

change config to be safe

3f09b79

remove useless code

b9d4510

remove useless code

8928a77

Ubuntu added 7 commits April 12, 2021 12:49

update some result

ada971e

add fusion script

c71ba1d

change store parameters

35e2c19

change setting

43327de

add basic num

25d1a31

Merge remote-tracking branch 'upstream/master' into add_adapter

3730efa

chaneg config

a099ba0

barry-jin mentioned this pull request Apr 15, 2021

Operator npx.broadcast_like #1552

Closed

simplify code

4783597

leezu reviewed Apr 29, 2021

View reviewed changes

Ubuntu added 3 commits May 19, 2021 11:50

use einsum

11660e5

add scripit

a093709

add comment

e25bcb3

xinyual commented May 26, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adapter #1545

Add adapter #1545

xinyual commented Mar 31, 2021

leezu Apr 2, 2021 •

edited

leezu Apr 2, 2021

leezu Apr 2, 2021

codecov bot commented Apr 12, 2021 •

edited

leezu Apr 29, 2021

leezu Apr 29, 2021 •

edited

leezu Apr 29, 2021

xinyual May 26, 2021 •

edited

sxjscience May 26, 2021

xinyual May 28, 2021

	backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True,
	backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=(method == 'adapter'),

	from .layers import SinusoidalPositionalEmbedding,\
	BucketPositionalEmbedding,\
	LearnedPositionalEmbedding

		parser.add_argument('--method', type=str, default='full', choices=['full', 'bias', 'subbias', 'adapter'],
		help='different finetune method')

Add adapter #1545

Are you sure you want to change the base?

Add adapter #1545

Conversation

xinyual commented Mar 31, 2021

Description

Checklist

Essentials

Changes

Comments

leezu Apr 2, 2021 • edited

Choose a reason for hiding this comment

leezu Apr 2, 2021

Choose a reason for hiding this comment

leezu Apr 2, 2021

Choose a reason for hiding this comment

codecov bot commented Apr 12, 2021 • edited

Codecov Report

leezu Apr 29, 2021

Choose a reason for hiding this comment

leezu Apr 29, 2021 • edited

Choose a reason for hiding this comment

leezu Apr 29, 2021

Choose a reason for hiding this comment

xinyual May 26, 2021 • edited

Choose a reason for hiding this comment

sxjscience May 26, 2021

Choose a reason for hiding this comment

xinyual May 28, 2021

Choose a reason for hiding this comment

leezu Apr 2, 2021 •

edited

codecov bot commented Apr 12, 2021 •

edited

leezu Apr 29, 2021 •

edited

xinyual May 26, 2021 •

edited