Text Preprocessing Script

This is a simple python script that I use for preprocessing text using NLTK.

File Details

 `preprocessing.py` file contains the main logic for the preprocessing the text.

 `example.py` file contains a some example codes for all the functions.

Sample Code

from preprocess import Preprocess

preprocess = Preprocess()

string = "Hello, World! This is a sample text that I'm using for tesing"

print("Sample Text: ", string)

sentences = preprocess.sentence_tokenizer(string)

print("Sentences: ", sentences)

clean_text = preprocess.clean_text_form(sentences[1])

print("Clean Text 1: ", clean_text)

clean_text2 = preprocess.remove_unchars(clean_text)

print("Clean Text 2: ", clean_text2)

words = preprocess.word_tokenizer(clean_text2)

print("Words: ", words)

no_stop_words = preprocess.remove_stop_wrods(words)

print("Stop Words Removal: ", no_stop_words)

pos = preprocess.get_pos(words)

print("POS Tags: ", pos)

lemmas = preprocess.lemmatizer(words)

print("Lemmatization: ", lemmas)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
preprocess.py		preprocess.py
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

example.py

example.py

preprocess.py

preprocess.py

requirments.txt

requirments.txt

Repository files navigation

Text Preprocessing Script

File Details

Sample Code

About

Releases

Packages

Languages

License

imdeepmind/TextPreprocessingScript

Folders and files

Latest commit

History

Repository files navigation

Text Preprocessing Script

File Details

Sample Code

About

Topics

Resources

License

Stars

Watchers

Forks

Languages