Skip to content

This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)

Notifications You must be signed in to change notification settings

denocris/MHPC-Natural-Language-Processing-Lectures

Repository files navigation

Natural Language Processing - Cristiano De Nobili

This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP). It is about Natural Language processing, in particular on recent progress involving transformers-based models. I must thank the innovative-startup AINDO for the support.

Cristiano holds a Ph.D. in Theoretical Physics (SISSA) and he has been actively working in Deep Learning for four years. In particular, he is now part of the Bixby project, Samsung's vocal assistant. He is also a TEDx speaker (here is talk about AI, Humans and their future) and civil pilot (PPL). Here his contacts:

  • If you are interested in science and tech news: LinkedIn & Twitter;
  • On my website you can find all my lectures, workshops, and talks;
  • My Instagram is about flying, traveling, and adventure. It is the social platform that I use the most.

Have also a look at the first part of the course, Introduction to Neural Networks (with PyTorch), by Alessio Ansuini, and the third part, Deep generative models with TensorFlow 2, by Piero Coronica.

Course Outline

You can find here the videos of the lectures. For this year, I decided to use PyTorch as the main Deep Learning library.

  • Lecture 1: intro to NLP, text preprocessing, spaCy, common problems in NLP (NER, POS, sentence classification, ...), non-contextual word embedding, SkipGram Word2Vec coded from scratch, pre-trained Glove with Gensim, intro to contextual word embedding and (self-)Attention Mechanism.

  • Lecture 2: transfer learning main concepts, transformer-based model, how BERT-like models are trained and fine-tuned on downstream tasks, intro to Transformers library Hugging Face, tokenization, language modeling with English and non-English (Italian Gilberto and Umberto) pre-trained AutoModels, some examples of NLP problems using Transformers Pipeline.

  • Lecture 3: fine-tune a pre-trained Italian RoBERTa to solve word-sense disambiguation, embedding geometry, clustering (TSNE and UMAP) and visualization (this lecture is a bit advanced). Part of this notebook is done using PyTorch Lightning.

Useful links and references are inside each notebook. For any doubts or questions feel free to contact me!