Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 1.53 KB

06-Transfer_learning.md

File metadata and controls

49 lines (38 loc) · 1.53 KB

Transfer learning

Idea:

  • Tasks should represent knowledge.
  • Should be able to combine those tasks and do better.

Why for NLP?:

  • All is over language data.
  • Language has intrinsic representations.
  • And structural similarities: similar tasks share characteristics.
  • Tasks can inform each other: syntax and semantics.
  • Annotated data is rare.
  • Transfer learning works really good for many supervised NLP (e.g. classification, information extraction).

Types of transfer learning:

  • Transductive transfer learning: same task, labeled data only in source domain.
  • Inductive transfer learning: different tasks, labeled data in target domain.

We focus on sequential transfer learning:

  • Learn on one task / dataset, then transfer to another task.
  • You shall know a word by the company it keeps

Supervised pretraining:

  • Very common in vision, less in NLP.
  • Machine translation

Target tasks:

  • Sentence or document classification (e.g. sentiment).
  • Sentence pair classification.
  • Structured prediction (e.g. parsing).
  • Generation (e.g. dialogue, summarization, translation).

Word-in-context vectors:

  • Different representation for cats in We have two cats and It's raining cats and dogs.

Language Modelling pretraining:

  • Doesn't require annotation.
  • Example Markov Models, P(text) or P(text | some other text).

Why embed words?:

  • Lower dimensional space.

word2vec:

  • Context to word: CBOW.
  • Word to context: Skip-gram.

Contextual word vectors:

  • Translate sentences to vectors.
  • Pretrained context-to-vector translations work better.