Skip to content

core-skills/12-text-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORE Skills Data Science Springboard - Day 12 - Special Data Types: Natural Language Processing

Binder

Overview

Aims

  1. Gain a practical understanding of traditional and modern natural language processing techniques.
  2. Develop an intuition for knowledge graphs and ontologies.
  3. Familiarisation with basic text handling and processing such as lemmatisation, stemming, etc.
  4. Gain intuition towards word vectors and their applications in natural language processing.
  5. Develop an understanding of supervised learning using modern tools such as HuggingFace.
  6. Develop an understanding of unsupervised learning using latent topic models.

Schedule

AWST AEST Agenda
07:30 - 07:45 09:30 - 09:45 Q&A, Issues & Announcements
07:45 - 09:15 09:45 - 11:15 12.0 Overview of NLP
09:15 - 09:30 11:15 - 11:30 Morning Tea
09:30 - 11:00 11:30 - 13:00 12.1/2 Fundamentals of NLP
11:00 - 11:45 13:00 - 13:45 Lunch
11:45 - 13:15 13:45 - 15:15 12.3 Supervised Learning
13:15 - 13:30 15:15 - 15:30 Afternoon Tea
13:30 - 14:30 15:30 - 16:30 12.4 Unsupervised Learning
14:30 - 14:55 16:30 - 16:55 Closeout & Feedback
14:55 - 15:00 16:55 - 17:00 Menti Feedback
15:00 - 15:00 17:00 - 17:00 Closeout

Resources mentioned in the course

  1. Cere for Transforming Maintenance Through Data Science (CTMTDS)
  2. CTMTDS - Theme 1 Support the Maintainer (Wei & Tyler; NLP-TLP)
  3. UWA - Natural & Technical Language Processing Group
  4. Industrial Ontologies - Maintenance Working Group
  5. Interactive word2vec (embedding) visualisation tool
  6. HuggingFace 🤗
  7. HuggingFace 🤗 Notebooks
  8. Allen Institute of Artificial Intelligence (AI2) - Demos
  9. GPT-3 Language Model Demos

Dependencies used in the notebooks

  1. Spacy - Industrial Strength Natural Language Processing
  2. Gensim - Topic Modelling for Humans
  3. NLTK - Natural Language Toolkit
  4. PyTorch - Binary Cross Entropy Loss (BCELoss)
  5. PyTorch - Recurrent Neural Network (RNN) module
  6. CUDA framework for GPU training
  7. CUDA supported GPUs

About

Day 12 - Finding Needles in Wordstacks: Natural Language Processing and Text Mining

Resources

License

Stars

Watchers

Forks

Packages

No packages published