Text Preprocessing with NLTK

Overview

This repository contains Python code demonstrating text preprocessing using the Natural Language Toolkit (NLTK) library. Text preprocessing is an essential step in natural language processing (NLP) projects, where raw text data is cleaned and transformed to prepare it for analysis or modeling tasks.

Features

Removes punctuations, URLs, and stop words from text data
Performs tokenization, stemming, and lemmatization
Segments text into sentences

Dependencies

Python 3.x
NLTK

Installation

Clone this repository:

git clone https://github.com/your_username/text-preprocessing-nltk.git
cd text-preprocessing-nltk

Install the required dependencies using pip:
```
pip install -r requirements.txt
```

Usage

Ensure you have your text data ready. You can either use the provided sample data or replace it with your own dataset.
Run the preprocessing script:
```
python preprocess.py
```
View the preprocessed text output in the console.

Sample Data

The sample data file data.jsonl contains JSON Lines formatted text entries. Each entry has a "text" field representing the raw text data.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

NLTK developers for providing a powerful natural language processing library.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
NLP_Assigment_01.ipynb		NLP_Assigment_01.ipynb
README.md		README.md
emotion_reduced_dataset.jsonl		emotion_reduced_dataset.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP_Assigment_01.ipynb

NLP_Assigment_01.ipynb

README.md

README.md

emotion_reduced_dataset.jsonl

emotion_reduced_dataset.jsonl

Repository files navigation

Text Preprocessing with NLTK

Overview

Features

Dependencies

Installation

Usage

Sample Data

License

Acknowledgments

About

Releases

Packages

Languages

Huzaifa-367/Text-Preprocessing-with-NLTK

Folders and files

Latest commit

History

Repository files navigation

Text Preprocessing with NLTK

Overview

Features

Dependencies

Installation

Usage

Sample Data

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Languages