STREAMLIT APPLICATION BERT IMPLEMENTATION

Quora_Question_Classification

Objective:

The Quora Sincere/Insincere Questions Classification is a Python NLP project that aims to segregate the given dataset of questions posted on Quora into relevant Sincere or Insincere buckets for optimized text classification.

Sincere Questions:

These are questions that are genuinely seeking helpful answers, contributing to the knowledge-sharing ethos of Quora. Sincere questions are founded on a desire for information, insights, and solutions.

Insincere Questions:

Insincere questions are problematic as they are typically not genuine inquiries for information. They may be based on false premises, intended to make statements, or can be offensive, divisive, or inappropriate in nature. These questions often violate Quora's "Be Nice, Be Respectful" policy

Project Scope

Utilized Quora's Insincere Questions Classification Kaggle Dataset.
Employed Natural Language Processing (NLP) techniques for text analysis.
Implemented various models including Logistic Regression, Naive Bayes, Convolutional Neural Network (CNN), and BERT.
Evaluated models based on accuracy, precision, recall, and F1 score.
Explored text characteristics crucial for classification, such as word count, character count, and stopword frequency.
Developed a Streamlit application utilizing DistilBERT for real-time question classification.

Exploratory Data Analysis (EDA)

Analysis Overview

Word Cloud Visualization: Identified most frequent words in sincere and insincere questions.
Bigram Frequency Analysis: Explored pairs of words occurring frequently together in both sincere and insincere questions.
Text Characteristics Examination: Analyzed various text attributes crucial for classification, including word count, character count, unique word count, etc.

Feature Extraction

Word Count: Calculating the total number of words in each question.
Unique Word Count: Identifying the count of distinct words used in a question.
Character Count: Determining the total number of characters present in the question text.
Stopwords Count: Counting the occurrences of commonly used stopwords (e.g., "the," "is," "and") in questions.
Punctuation Count: Tracking the usage of punctuation marks within the questions.
Title and Uppercase Words Count: Detecting the count of words in uppercase or within the title of the question.
Average Word Length: Calculating the average length of words used in a question.

Implemented Models:

Logistic Regression
Naive Bayes
Convolutional Neural Network (CNN)
BERT Implementation in a Streamlit application for real-time question classification.

STREAMLIT APPLICATION BERT IMPLEMENTATION

OUTPUT:

Recommendations

Further model fine-tuning and optimization could enhance the performance of Logistic Regression and Naive Bayes models.
Exploring lightweight NLP models for efficient text classification might be beneficial for large-scale applications.
Continuous monitoring and improvement of the models' performance against evolving data trends are essential.

Conclusion

This project showcased the efficacy of various NLP models in categorizing Quora questions. While each model had its strengths and weaknesses, the CNN, Logistic Regression, Naive Bayes, and the Streamlit application with DistilBERT offered valuable insights into classifying sincere and insincere questions on Quora.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
INFO7390_P3_NLP_Nainil_Simran.ipynb		INFO7390_P3_NLP_Nainil_Simran.ipynb
README.md		README.md
Simran_Nainil_NLP_PPT.pptx		Simran_Nainil_NLP_PPT.pptx
Simran_Nainil_NLP_Report.pdf		Simran_Nainil_NLP_Report.pdf
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INFO7390_P3_NLP_Nainil_Simran.ipynb

INFO7390_P3_NLP_Nainil_Simran.ipynb

README.md

README.md

Simran_Nainil_NLP_PPT.pptx

Simran_Nainil_NLP_PPT.pptx

Simran_Nainil_NLP_Report.pdf

Simran_Nainil_NLP_Report.pdf

app.py

app.py

Repository files navigation

Quora_Question_Classification

Objective:

Sincere Questions:

Insincere Questions:

Project Scope

Exploratory Data Analysis (EDA)

Analysis Overview

Feature Extraction

Implemented Models:

STREAMLIT APPLICATION BERT IMPLEMENTATION

OUTPUT:

Recommendations

Conclusion

About

Releases

Packages

Languages

simran2097/Quora-Questions-Classification-NLP

Folders and files

Latest commit

History

Repository files navigation

Quora_Question_Classification

Objective:

Sincere Questions:

Insincere Questions:

Project Scope

Exploratory Data Analysis (EDA)

Analysis Overview

Feature Extraction

Implemented Models:

STREAMLIT APPLICATION BERT IMPLEMENTATION

OUTPUT:

Recommendations

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages