Text preprocessing for Natural Language Processing

A python package for text preprocessing task in natural language processing.

Usage

To use this text preprocessing package, first install it using pip:

pip install text-preprocessing

Then, import the package in your python script and call appropriate functions:

from text_preprocessing import preprocess_text
from text_preprocessing import to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word

# Preprocess text using default preprocess functions in the pipeline
text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
preprocessed_text = preprocess_text(text_to_process)
print(preprocessed_text)
# output: hello email visit website

# Preprocess text using custom preprocess functions in the pipeline
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
print(preprocessed_text)
# output: helllo i am john doe my email is visit our website

Features

Feature	Function
convert to lower case	to_lower
convert to upper case	to_upper
keep only alphabetic and numerical characters	keep_alpha_numeric
check and correct spellings	check_spelling
expand contractions	expand_contraction
remove URLs	remove_url
remove names	remove_name
remove emails	remove_email
remove phone numbers	remove_phone_number
remove SSNs	remove_ssn
remove credit card numbers	remove_credit_card_number
remove numbers	remove_number
remove bullets and numbering	remove_itemized_bullet_and_numbering
remove special characters	remove_special_character
remove punctuations	remove_punctuation
remove extra whitespace	remove_whitespace
normalize unicode (e.g., café -> cafe)	normalize_unicode
remove stop words	remove_stopword
tokenize words	tokenize_word
tokenize sentences	tokenize_sentence
substitute custom words (e.g., vs -> versus)	substitute_token
stem words	stem_word
lemmatize words	lemmatize_word
preprocess text through a sequence of preprocessing functions	preprocess_text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DESCRIPTION.rst

DESCRIPTION.rst

Text preprocessing for Natural Language Processing

Usage

Features

Files

DESCRIPTION.rst

Latest commit

History

DESCRIPTION.rst

File metadata and controls

Text preprocessing for Natural Language Processing

Usage

Features