Skip to content

emanjavacas/macberth-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MacBERTh

This repository contains the code used to evaluate the historical BERT model (MacBERTh) trained on more than 3B tokens of historical English across 5 centuries. Publication detailing training, data sources and evaluation is forthcoming.

In order to use the model, you can use the transformers API using MacBERThs online path: emanjavacas/MacBERTh.

>>> from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
>>> path = 'emanjavacas/MacBERTh'
>>> model = AutoModelForMaskedLM.from_pretrained(path)
>>> tokenizer = AutoTokenizer.from_pretrained(path)
>>> pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
>>> s = 'For we do it not actuallye in dede, but [MASK] in a misterye.'
>>> pipe(s)

[{'sequence': 'for we do it not actuallye in dede, but onely in a misterye.',
  'score': 0.4811665117740631,
  'token': 1656,
  'token_str': 'onely'},
 {'sequence': 'for we do it not actuallye in dede, but only in a misterye.',
  'score': 0.36467066407203674,
  'token': 1200,
  'token_str': 'only'},
 {'sequence': 'for we do it not actuallye in dede, but as in a misterye.',
  'score': 0.034449998289346695,
  'token': 878,
  'token_str': 'as'},
 {'sequence': 'for we do it not actuallye in dede, but rather in a misterye.',
  'score': 0.027537666261196136,
  'token': 1754,
  'token_str': 'rather'},
 {'sequence': 'for we do it not actuallye in dede, but yet in a misterye.',
  'score': 0.023396290838718414,
  'token': 1126,
  'token_str': 'yet'}]

The code is model-agnostic relying on the huggingface interface to transformers models. We do, however, rely on OED data for evaluation which we cannot release due to it not being freely available. Still, the code should be illustrative about the approach taken for the different tasks, and should be easy to adapt to datasets similar to the OED.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages