Learning NLP

Some Tutorials and in depth analysis of Natural Language Processing (NLP) techniques and applied NLP

Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

ADD PROJECT DESCRIPTION + TWO LINES ABOUT MLJC

Built With

Much Love 💕

Getting Started

You can either get a local copy by downloading this repo or either use Google Colaboratory by copy-pasting the link of the notebook (.ipynb file) of your choice.

Prerequisites (Local Version)

Install Miniconda

Please go to the Anaconda website. Download and install the latest Miniconda version for Python 3.8 for your operating system.

wget <http:// link to miniconda>
sh <miniconda*.sh>

Download This Repo

git clone https://github.com/MachineLearningJournalClub/LearningNLP

Setup Conda Environment

IN THE END WE CAN SETUP A CONDA ENVIRONMENT AND EXPORT REQUIREMENTS (NEEDED LIBRARIES)

Change directory (cd) into the LearningNLP folder, then type:

# cd LearningNLP
conda env create -f environment.yml
source activate LNLP

Tutorial 1

Topics

Sentiment Analysis with Logistic Regression
Sentiment Analysis with Naive Bayes
Word Vectorizing (CountVectorizer in Scikit-learn)
Some Explainability Methods

Notebook

Dataset: ArXiv from Kaggle
Preprocessing: pandas, nltk, gensim
Binary classification: Scikit-learn's CountVectorizer + TfidfTransformer
Explainability Methods: LIME, SHAP

Useful references for explainibility methods:
- LIME, Why Should I Trust You?": Explaining the Predictions of Any Classifier
- SHAP, A Unified Approach to Interpreting Model Predictions
- Adversarial attacks (have you heard of?), i.e. how to fool algorithms --> Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
Open Questions for you:
- How to deal with multiclass problems?
- Try to develop binary classification with abstracts instead of titles
- Try to develop the same pipeline with spaCy

Tutorial 2

Topics

Bias & Fairness in NLP (Ethics and Machine Learning)
Gender Framing (in Political Tweets)
Political Party Prediction
Topic Modeling - Latent Dirichlet Allocation (LDA)

Slides

We'd like to introduce some ethical concerns in ML and especially in NLP, the idea is to start a long-term project directed towards Bias & Fairness in Machine Learning, i.e. intrinsic problems in our data can create inequalities in the real world (Have you watched "Coded Bias" on Netflix?)

Notebook

Dataset: we created a dataset by scraping tweets from some US politicians
Preprocessing: pandas, nltk, gensim
Binary classification: Scikit-learn's CountVectorizer + TfidfTransformer
Topic Modeling by employing Latent Dirichlet Allocation (LDA) + visualization. Some educational contents for LDA: L. Serrano part 1 on LDA, L. Serrano part 2 How to train LDA

Tutorial 3

In the two following notebooks we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Tutorial 3.1

Topics

Exploratory Data Analysis

Tutorial 3.2

You can directly run it on Kaggle

Topics

Pretrained Word2Vec model, feature extraction
Dimensionality Reduction and visualization with UMAP
Naive Word2Vec Augmentation

Tutorial 4

Topics

Global Vectors for word representations (GloVe), Stanford NLP
Fasttext, skipgrams vs CBOWs
Bias in Word Embeddings (Gender + Ethnic Stereotypes) with WEFE
Bias in Word Embeddings: What causes it?

Possible Ideas:

Understanding Bias in Word Embeddings, ICML paper + code
Employing The Word Embedding Fairness Evaluation Framework (WEFE): WEAT, (RIPA?)
Debiasing Word Embeddings, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, code
Biasing a simple model: how can we deliberately bias our model by injecting biased information into our model? What can we learn from this? How is this thing useful for debiasing purposes?

Tutorial 5

In the two following notebook we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Topics

Data Augmentation

Tutorial 6

In the following notebooks (in this Github repo) we outlined our solution for the CommonLit Readibility Prize

Topics

Finetuning Sentence Transformers models (Roberta family) in PyTorch
Possible strategies for data augmentation

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See LICENSE for more information.

Contact

Simone Azeglio - email : simone.azeglio@edu.unito.it - linkedin

Luca Bottero - email : luca.bottero192@edu.unito.it - linkedin

Marina Rizzi - email : - linkedin

Alessio Borriero - email : alessio.borriero@edu.unito.it - linkedin

Micol Olocco - email : - linkedin

Project Link: https://github.com/MachineLearningJournalClub/LearningNLP

Acknowledgements

HPC4AI

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
img		img
01_LearningNLP_Tutorial.ipynb		01_LearningNLP_Tutorial.ipynb
02_LearningNLP_Slides.pdf		02_LearningNLP_Slides.pdf
02_LearningNLP_Tutorial.ipynb		02_LearningNLP_Tutorial.ipynb
03.1_LearningNLP_Tutorial.ipynb		03.1_LearningNLP_Tutorial.ipynb
03.2_LearningNLP_Tutorial.ipynb		03.2_LearningNLP_Tutorial.ipynb
04_LearningNLP_Tutorial.ipynb		04_LearningNLP_Tutorial.ipynb
05_LearningNLP_Tutorial.ipynb		05_LearningNLP_Tutorial.ipynb
LICENSE.md		LICENSE.md
PaperArchive.md		PaperArchive.md
README.md		README.md

License

MachineLearningJournalClub/LearningNLP

Folders and files

Latest commit

History

Repository files navigation

Learning NLP

Some Tutorials and in depth analysis of Natural Language Processing (NLP) techniques and applied NLP

Table of Contents

About The Project

Built With

Getting Started

Prerequisites (Local Version)

Topics

Notebook

Topics

Notebook

Tutorial 3

Topics

Topics

Topics

Possible Ideas:

Topics

Topics

Roadmap

Contributing

License

Contact

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages