Skip to content

rshinde03/Natural-Language-Processing

Repository files navigation

Natural-language-Processing

Text Summarization & Web Scrapping

This file consists of a piece of text scrapped from a website and involves basic text data pre-processing techniques such as lemmatization, word and senetnce tokenization , stopword removal, punctuation removal, uper to lower case conversion and digit removal. The pre-processed text data is then used for frequency distribution count of words and then used for text ranking and summarization using TF-IDF and Gensim and the results are compared.

Text Summarization with N-Grams

Similar approach as above is performed for this text data but N-Gram is sued for the frequency of words calculation and summarization. Here Unigrams, Bigrams and Trigrams are created first and then used for frequency count.

Word Prediction with N-Grams

Created word tokens of the sentence, found frequency for each of the unigrams and relative frequency for bigrams. Performed word prediction using the relative frequency and probability.

NER & De-Identification Using SPACY

Used Spacy library to perform Named Entity Recognition on a webscarpped news article.

About

This repository contains various small Natural Language Processing based projects including text summarization using Spacy and N-grams, along with word predictions.

Topics

Resources

Stars

Watchers

Forks