Skip to content

krisograbek/text-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Preprocessing in Python

You can find a collection of Text Preprocessing steps that are common in a variety of NLP tasks.

Open In Colab

Libraries

string re nltk spaCy gensim textblob unidecode pyspellchecker autocorrect

Text Cleaning

Removals

  • punctuations
  • stopwords
  • numbers
  • HTML Tags
  • URLs
  • Emojis
  • whitespaces / newlines

Replacing accented characters

Text Normalization

Spell Corrections

Tokenization - Word and Sentence

Lemmatization

Stemming

About

Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published