Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Add adapter #1545

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Add adapter #1545

wants to merge 24 commits into from

Conversation

xinyual
Copy link
Contributor

@xinyual xinyual commented Mar 31, 2021

Description

add adapter and bias-finetune to finetune-script

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

@xinyual xinyual requested a review from a team as a code owner March 31, 2021 12:30
backbone = Model.from_cfg(cfg)
# Load local backbone parameters if backbone_path provided.
# Otherwise, download backbone parameters from gluon zoo.

backbone_params_path = backbone_path if backbone_path else download_params_path
if checkpoint_path is None:
backbone.load_parameters(backbone_params_path, ignore_extra=True,
backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True,
Copy link
Contributor

@leezu leezu Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following be safer?

Suggested change
backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True,
backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=(method == 'adapter'),

@@ -28,6 +28,8 @@
import numpy as _np
from typing import Union, Optional, List, Dict
from .op import relative_position_bucket
#from .attention_cell import MultiHeadAttentionCell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a circular import, as attention_cell also imports layers.

from .layers import SinusoidalPositionalEmbedding,\
BucketPositionalEmbedding,\
LearnedPositionalEmbedding

To solve this, two options are to either move SinusoidalPositionalEmbedding,
BucketPositionalEmbedding,
LearnedPositionalEmbedding out of the layers.py into a new file and change the import in attention_cell. Or you can move AdapterModule into a new file. You can also come up with other solutions

Comment on lines 97 to 98
parser.add_argument('--method', type=str, default='full', choices=['full', 'bias', 'subbias', 'adapter'],
help='different finetune method')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to edit the README file to include results for (at least some of) the different choices (and references to the papers)?

@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #1545 (43327de) into master (1326258) will decrease coverage by 0.71%.
The diff coverage is 78.94%.

❗ Current head 43327de differs from pull request most recent head e25bcb3. Consider uploading reports for the commit e25bcb3 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1545      +/-   ##
==========================================
- Coverage   82.20%   81.48%   -0.72%     
==========================================
  Files          68       68              
  Lines        8540     8432     -108     
==========================================
- Hits         7020     6871     -149     
- Misses       1520     1561      +41     
Impacted Files Coverage Δ
src/gluonnlp/layers.py 86.72% <ø> (ø)
src/gluonnlp/models/transformer.py 97.89% <70.00%> (-0.64%) ⬇️
src/gluonnlp/models/bert.py 94.42% <88.88%> (-0.39%) ⬇️
conftest.py 76.31% <0.00%> (-9.94%) ⬇️
src/gluonnlp/data/loading.py 76.55% <0.00%> (-7.39%) ⬇️
src/gluonnlp/data/filtering.py 78.26% <0.00%> (-4.35%) ⬇️
src/gluonnlp/utils/lazy_imports.py 58.42% <0.00%> (-2.25%) ⬇️
src/gluonnlp/data/tokenizers/yttm.py 81.73% <0.00%> (-1.02%) ⬇️
src/gluonnlp/torch/models/transformer.py 27.23% <0.00%> (-0.92%) ⬇️
src/gluonnlp/data/tokenizers/spacy.py 65.33% <0.00%> (-0.91%) ⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ff0519...e25bcb3. Read the comment docs.

@@ -626,39 +623,33 @@ def layout(self):
def forward(self, inputs, token_types, valid_length,
masked_positions):
"""Getting the scores of the masked positions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is required.

Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on the same line as the opening quotes or on the next line. The entire docstring is indented the same as the quotes at its first line (see example below).

https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings

out = self.down_proj(data)
out = self.activate(out)
out = self.up_proj(out)
return out + residual
Copy link
Contributor

@leezu leezu Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not need a separate argument "residual" here. The residual connection described in the paper refers to doing return out + data, where data is the original input before down projection, activation function and up projection.

image
http://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf

out = self.ffn_2(out)
out = self.dropout_layer(out)
if self._use_adapter and 'location_1' in self._adapter_config:
out = self.adapter_layer_ffn(out, residual)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your implementation of BasicAdapter, you'd need to call this layer as self.adapter_layer_ffn(out, out).

def forward(self, query, key, value):
#query bs, length, unit
#key bs, length, num_adapters, unit

Copy link
Contributor Author

@xinyual xinyual May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi xingjian @sxjscience ,could you please check these lines with the utilization of einsum? I show the original implementation in comment. And if you want to look, the purpose of these lines are similiar to https://github.com/Adapter-Hub/adapter-transformers/blob/0fe1c19f601b7785273e173d30a9392e407823d1/src/transformers/adapters/modeling.py#L211 from line 211 to line 223

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. One improvement is that there is no need to transpose anymore. You can rely on einsum to fuse these operations in a single op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants