Skip to content

MachineLearningJournalClub/LearningNLP

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

Learning NLP

Some Tutorials and in depth analysis of Natural Language Processing (NLP) techniques and applied NLP


Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Roadmap
  4. Contributing
  5. License
  6. Contact
  7. Acknowledgements

About The Project

ADD PROJECT DESCRIPTION + TWO LINES ABOUT MLJC

Built With

  • Much Love 💕

Getting Started

You can either get a local copy by downloading this repo or either use Google Colaboratory by copy-pasting the link of the notebook (.ipynb file) of your choice.

Prerequisites (Local Version)

Install Miniconda

Please go to the Anaconda website. Download and install the latest Miniconda version for Python 3.8 for your operating system.

wget <http:// link to miniconda>
sh <miniconda*.sh>

Download This Repo

git clone https://github.com/MachineLearningJournalClub/LearningNLP

Setup Conda Environment

IN THE END WE CAN SETUP A CONDA ENVIRONMENT AND EXPORT REQUIREMENTS (NEEDED LIBRARIES)

Change directory (cd) into the LearningNLP folder, then type:

# cd LearningNLP
conda env create -f environment.yml
source activate LNLP

Topics

  • Sentiment Analysis with Logistic Regression
  • Sentiment Analysis with Naive Bayes
  • Word Vectorizing (CountVectorizer in Scikit-learn)
  • Some Explainability Methods

Notebook


Topics

  • Bias & Fairness in NLP (Ethics and Machine Learning)
  • Gender Framing (in Political Tweets)
  • Political Party Prediction
  • Topic Modeling - Latent Dirichlet Allocation (LDA)

We'd like to introduce some ethical concerns in ML and especially in NLP, the idea is to start a long-term project directed towards Bias & Fairness in Machine Learning, i.e. intrinsic problems in our data can create inequalities in the real world (Have you watched "Coded Bias" on Netflix?)

Notebook


Tutorial 3

In the two following notebooks we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Topics

  • Exploratory Data Analysis

You can directly run it on Kaggle

Topics

  • Pretrained Word2Vec model, feature extraction
  • Dimensionality Reduction and visualization with UMAP
  • Naive Word2Vec Augmentation

Topics

Possible Ideas:


In the two following notebook we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Topics

  • Data Augmentation

In the following notebooks (in this Github repo) we outlined our solution for the CommonLit Readibility Prize

Topics

  • Finetuning Sentence Transformers models (Roberta family) in PyTorch
  • Possible strategies for data augmentation

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See LICENSE for more information.

Contact

Simone Azeglio - email : simone.azeglio@edu.unito.it - linkedin

Luca Bottero - email : luca.bottero192@edu.unito.it - linkedin

Marina Rizzi - email : - linkedin

Alessio Borriero - email : alessio.borriero@edu.unito.it - linkedin

Micol Olocco - email : - linkedin

Project Link: https://github.com/MachineLearningJournalClub/LearningNLP

Acknowledgements

Logo