Skip to content

morganjwilliams/12-text-processing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORE Skills Data Science Springboard - Day 12 - Special Data Types: Natural Language Processing

Binder

Overview

Aims

  1. Gain a practical understanding of traditional and modern natural language processing techniques.
  2. Develop an intuition for knowledge graphs and ontologies.
  3. Familiarisation with basic text handling and processing such as lemmatisation, stemming, etc.
  4. Gain intuition towards word vectors and their applications in natural language processing.
  5. Develop an understanding of unsupervised learning using latent topic models
  6. Develop an understanding of supervised learning using modern tools such as PyTorch for sentiment analysis.

Schedule

AWST AEST Agenda
07:30 - 07:45 09:30 - 09:45 Q&A, Issues & Announcements
07:45 - 09:15 09:45 - 11:15 Session 1: Handling Text and Basic Text Processing
09:15 - 09:30 11:15 - 11:30 Morning Tea
09:30 - 11:00 11:30 - 13:00 Session 2: Word Embeddings
11:00 - 11:45 13:00 - 13:45 Lunch
11:45 - 13:15 13:45 - 15:15 Session 3: Unsupervised Learning
13:15 - 13:30 15:15 - 15:30 Afternoon Tea
13:30 - 14:45 15:30 - 16:45 Session 4: Supervised Learning
14:45 - 15:00 16:45 - 17:00 Closeout

Miscellaneous links

Additional information pertaining to chat based discussions and material within the workshop:

  1. Centre for Transforming Maintenance Through Data Science (CTMTDS): https://www.maintenance.org.au/
  2. CTMTDS - Theme 1 Support the Maintainer (Wei & Tyler; NLP): https://www.maintenance.org.au/category/rt1
  3. Industrial Ontologies - Maintenance Working Group: https://www.industrialontologies.org/?page_id=92
  4. Aquila exploratory data analysis tool: http://agent.csse.uwa.edu.au/aquila
  5. Spacy - Industrial Strength Natural Language Processing: https://spacy.io/
  6. Gensim - Topic Modelling for Humans: https://radimrehurek.com/gensim/
  7. NLTK - Natural Language Tool Kit: https://www.nltk.org/
  8. Interactive word2vec (embedding) visualisation tool: https://ronxin.github.io/wevi/
  9. PyTorch - Binary Cross Entropy Loss (BCELoss): https://pytorch.org/docs/stable/nn.html#bceloss
  10. PyTorch - Recurrent Neural Network (RNN) module: https://pytorch.org/docs/stable/nn.html#rnn
  11. CUDA framework for GPU training: https://developer.nvidia.com/cudnn
  12. CUDA supported GPUs: https://developer.nvidia.com/cuda-gpus
  13. Automatic Summarization (NLP/NLG): https://en.wikipedia.org/wiki/Automatic_summarization
  14. Industrial Ontologies - Maintenance Working Group: https://www.industrialontologies.org/?page_id=92
  15. Example of embeddings drawing powerful insights into COVID19 research: https://www.kaggle.com/tarunpaparaju/covid-19-dataset-gaining-actionable-insights

About

Day 12 - Finding Needles in Wordstacks: Natural Language Processing and Text Mining

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%