Skip to content

Latest commit

 

History

History
21 lines (11 loc) · 1.53 KB

corpora_resources.md

File metadata and controls

21 lines (11 loc) · 1.53 KB

Corpora and Resources

Awesome-Chinese-NLP: Corpora, tools, and resources for NLP projects in Chinese.

Corpus for Finance (CoFIF): Reference documents and reports from France’s 60 largest companies from 1995-2018.

Gallica: French books, academic journals, newspapers, sound recordings, and videos.

German-NLP: Corpora, tools, and resources for NLP projects in German.

Japanese Text Initative: Classical Japanese Literature.

Middle Eastern and North African Newspapers: Arabic newspapers from 1870-2019.

Project Gutenberg: eBooks in a variety of languages.

TS Corpus: Corpora, tools, and resources for NLP projects in Turkish.

Twitter API: Tweets, Direct Messages, users, and other Twitter resources are available to download and analyze. Learn more about using the Twitter API for NLP here.

Wikipedia: Data dumps from all wikis in different languages. Learn how to clean and process the Wiki data here.