Quora-Questions-Pairs-App

This research is based on the toturial BERT Fine-Tuning Tutorial with PyTorch.

Under training-bert folder you can find a Jupyter notebook. There I show how I fined-tune base-uncased bert model to solve the classification problem of duplication questions from Quora website.

Introduction

In this research I'd like to use BERT with the huggingface PyTorch library to fine-tune a model which will perform best in question pairs classification. The app is build using Streamlit.

So firstly let's talk about the model and the dataset:

Bert

Bidirectional Encoder Representations from Transformers (BERT) was released, and pretrained, in late 2018 by Google (see original model code here) for NLP (Natural Language Processing) tasks. Bert was created originally by Jacob Devlin with two corpora in pre-training: BookCorpus and English Wikipedia.

BERT consists of 12 Transformer Encoding layers (or 24 for large BERT). If you stack Transformer Decoding layers you'll GPT model to generate senetances.

You can more information inthe those videos:

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

BERT Neural Network - EXPLAINED!

Quora Question Pairs Dataset

Quora is a question-and-answer website where questions are asked, answered, followed, and edited by Internet users, either factually or in the form of opinions. Quora was co-founded by former Facebook employees Adam D'Angelo and Charlie Cheever in June 2009. website was made available to the public for the first time on June 21, 2010. Today the website is available in many languages.

Over 100 million people visit Quora every month, so it's no surprise that many people ask similarly worded questions. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question, and make writers feel they need to answer multiple versions of the same question.

The goal is to predict which of the provided pairs of questions contain two questions with the same meaning. The ground truth is the set of labels that have been supplied by human experts. The dataset itself can be downloaded from kaggle: here.

Application

How to use it?

see the following video:

Install

Clone the repo:

git clone https://github.com/idanmoradarthas/Quora-Questions-Pairs-App.git
cd Quora-Questions-Pairs-App

go to the training folder, install the requirements and run the notebook in order to create the model:

cd training-bert
pip install -r requirements.txt
jupyter notebook

Install the requirements in the main folder:

cd ..
pip install -r requirements.txt

Run Streamlit:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
training-bert		training-bert
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

training-bert

training-bert

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

app.py

app.py

requirements.txt

requirements.txt

Repository files navigation

Quora-Questions-Pairs-App

Introduction

Bert

Quora Question Pairs Dataset

Application

How to use it?

Install

About

Releases

Packages

Contributors 2

Languages

License

idanmoradarthas/Quora-Questions-Pairs-App

Folders and files

Latest commit

History

Repository files navigation

Quora-Questions-Pairs-App

Introduction

Bert

Quora Question Pairs Dataset

Application

How to use it?

Install

About

Topics

Resources

License

Stars

Watchers

Forks

Languages