Skip to content

Rajdeep2121/NLP-Fundamentals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-Basics

Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

TOKENIZATION

Tokenization is breaking a text chunk in smaller parts. Whether it is breaking Paragraph in sentences, sentence into words or word in characters.


STEMMING

Stemming is the process of reducing inflection in words to their root forms such as mapping a group of words to the same stem even if the stem itself is not a valid word in the Language.


STOP WORDS

Stop Words are the irrelevant words that support a sentence. These basically include 'the','a','and' etc. These have least importance in NLP so they are removed while preprocessing of data for further use.


WORDNET

Wordnet is a lexical database/dictionary of words for over 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, antonyms and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus.