Awesome-Chinese-NLP: Corpora, tools, and resources for NLP projects in Chinese.
Corpus for Finance (CoFIF): Reference documents and reports from France’s 60 largest companies from 1995-2018.
Gallica: French books, academic journals, newspapers, sound recordings, and videos.
German-NLP: Corpora, tools, and resources for NLP projects in German.
Japanese Text Initative: Classical Japanese Literature.
Middle Eastern and North African Newspapers: Arabic newspapers from 1870-2019.
Project Gutenberg: eBooks in a variety of languages.
TS Corpus: Corpora, tools, and resources for NLP projects in Turkish.
Twitter API: Tweets, Direct Messages, users, and other Twitter resources are available to download and analyze. Learn more about using the Twitter API for NLP here.
Wikipedia: Data dumps from all wikis in different languages. Learn how to clean and process the Wiki data here.