Skip to content

Latest commit

 

History

History
60 lines (42 loc) · 11.9 KB

ELMo.md

File metadata and controls

60 lines (42 loc) · 11.9 KB

Pre-converted Magnitude Formats of ELMo Models

ELMo models have been pre-converted to the .magnitude format for immmediate download and usage:

Contributor Data Light

(basic support for out-of-vocabulary keys)
Medium
(recommended)

(advanced support for out-of-vocabulary keys)
Heavy

(advanced support for out-of-vocabulary keys and faster most_similar_approx)
AI2 - AllenNLP ELMo 1 Billion Word Benchmark 768D1536D3072D 768D1536D3072D 768D1536D3072D
AI2 - AllenNLP ELMo
with Google News word2vec vocabulary
1 Billion Word Benchmark 768D1536D3072D 768D1536D3072D 768D1536D3072D
AI2 - AllenNLP ELMo Wikipedia (1.9B) + WMT 2008-2012 (3.6B) 3072D 3072D 3072D
AI2 - AllenNLP ELMo
with Google News word2vec vocabulary
Wikipedia (1.9B) + WMT 2008-2012 (3.6B) 3072D 3072D 3072D

ELMo Usage

ELMo usage is slightly different than other embedding models (word2vec, GloVe, and fastText), which merits some explanation as to how Magnitude handles these differences.

Contextual Embeddings

ELMo vectors are "contextual" meaning they take into account the nature of the word in the sentence the word is being used. For example, where as in word2vec the word "play" only has a single embedding that combines both interpretations of the word (a command to start music or a theatrical act). In ELMo, this is not the case. The embedding for the word "play" in ELMo would, in theory, be different when trained and used with example sentences like "Play some music on the living room speakers." and "Get tickets for the play tonight.".

An ELMo vector for a target word is actually comprised of 3 components (a 2D array 3 x (embedding dimensions) instead of just a 1D array of (embedding dimensions)):

  1. A forward pass bi-directional RNN contextual embedding taking into account words before the target word in the sentence.
  2. A backward pass bi-directional RNN contextual embedding taking into account words after the target word in the sentence.
  3. A context-independent embedding of the target word.

For ease of use, each of these 2D embeddings for a target word are concatenated into a single 1D embedding when you use an ELMo .magnitude model. So, for example, the elmo_2x1024_128_2048cnn_1xhighway_weights ELMo .magnitude model will actually contains 1D embeddings of size 768 (3 x 256 concatenated).

Unrolling ELMo Vectors

You can use Magnitude's concatenated 1D representation of ELMo's 2D representation, just like you would any other embedding (word2vec, fastText, GloVe). However, if you need the 2D representation for your application, you can easily unroll them after querying them in Magnitude like so:

elmo_vecs = Magnitude('elmo_2x1024_128_2048cnn_1xhighway_weights.magnitude')
sentence  = elmo_vecs.query(["play", "some", "music", "on", "the", "living", "room", "speakers", "."])
# Returns: an array of size (9 (number of words) x 768 (3 ELMo components concatenated))
unrolled = elmo_vecs.unroll(sentence)
# Returns: an array of size (3 (each ELMo component) x 9 x 256 (the number of dimensions for each ELMo component))

Querying with Context

Magnitude makes querying with context simple. Magnitude's query method already takes in 1D lists and 2D lists of words. If you query a 1D list of words, Magnitude will treat that as a sentence and use ELMo to contextualize each word embedding with the words before and after it. If you query a 2D list of words, Magnitude will treat that as a batch of sentences.

Vocabularies

ELMo vectors typically don't ship with a vocabulary of words to vectors since they require context and the vectors must be generated on the fly. This unfortunately means some of Magnitude's functions like most_similar or doesnt_match don't have any vocabularies to work with and return results for. We solve this problem by also including flavors of each ELMo model with a vocabulary from the Google's word2vec model attached (3,000,000 tokens) so that methods like most_similar can be used. If you don't need to use these methods, we recommend not downloading the models with a vocabulary as they add a significant amount to the file size. The vectors for these 3,000,000 tokens are generated by using ELMo to generate vectors on a sentence with only the target word in it (a single word sentence).

If you want to use a different vocabulary, see the documentation for the converter.

Out-of-Vocabulary for ELMo Vectors

ELMo models are character-based and, thus, handle out-of-vocabulary words through learned representations of subword information. Use ngram_oov=True on the Magnitude constructor to switch to using Magnitude's out-of-vocabulary method instead.

Remote Streaming

Magnitude has a remote streaming feature. ELMo models are supported, however, there isn't much benefit to using it with ELMo models as disk space will still need to be consumed for ELMo models in most cases.