GitHub - ClimateBert/language-model

Code repository for "ClimateBERT: A Pretrained Language Model for Climate-Related Text"

Link to paper: arxiv.org/abs/2110.12010

Usage

The usage is straightforward and comprises two steps:

The tokenizer is augmented with potentially new tokens which represent climate change specific language. This step led to the inclusion of tokens such as 'CO2' or 'CH4' which are often key to properly representing text. The code for this step can be found in 'tokenizer_augmentation.ipynb'. Besides transformer package, this step also requires the transformers_domain_adaptation packages.
Using the augmented tokenizer, the next step is to train the language model. This step follows basic steps from transformer package. We provide the code for this in 'language_modeling.ipynb'.

Dependencies

Our code depends on the transformer package and on transformers_domain_adaptation. For training ClimateBert, we used transformer 4.20. and transformers_domain_adaptation 0.3.1.

How do I cite ClimateBert?

For now, cite the Arxiv paper:

@article{webersinke2021climatebert,
  title={Climatebert: A pretrained language model for climate-related text},
  author={Webersinke, Nicolas and Kraus, Mathias and Bingler, Julia Anna and Leippold, Markus},
  journal={arXiv preprint arXiv:2110.12010},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
language_modeling.ipynb		language_modeling.ipynb
tokenizer_augmentation.ipynb		tokenizer_augmentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

language_modeling.ipynb

language_modeling.ipynb

tokenizer_augmentation.ipynb

tokenizer_augmentation.ipynb

Repository files navigation

Code repository for "ClimateBERT: A Pretrained Language Model for Climate-Related Text"

Usage

Dependencies

How do I cite ClimateBert?

About

Releases

Packages

Contributors 2

Languages

License

ClimateBert/language-model

Folders and files

Latest commit

History

Repository files navigation

Code repository for "ClimateBERT: A Pretrained Language Model for Climate-Related Text"

Usage

Dependencies

How do I cite ClimateBert?

About

Resources

License

Stars

Watchers

Forks

Languages