CORE Skills Data Science Springboard - Day 12 - Special Data Types: Natural Language Processing

Overview

Aims

Gain a practical understanding of traditional and modern natural language processing techniques.
Develop an intuition for knowledge graphs and ontologies.
Familiarisation with basic text handling and processing such as lemmatisation, stemming, etc.
Gain intuition towards word vectors and their applications in natural language processing.
Develop an understanding of unsupervised learning using latent topic models
Develop an understanding of supervised learning using modern tools such as PyTorch for sentiment analysis.

Schedule

AWST	AEST	Agenda
07:30 - 07:45	09:30 - 09:45	Q&A, Issues & Announcements
07:45 - 09:15	09:45 - 11:15	Session 1: Handling Text and Basic Text Processing
09:15 - 09:30	11:15 - 11:30	Morning Tea
09:30 - 11:00	11:30 - 13:00	Session 2: Word Embeddings
11:00 - 11:45	13:00 - 13:45	Lunch
11:45 - 13:15	13:45 - 15:15	Session 3: Unsupervised Learning
13:15 - 13:30	15:15 - 15:30	Afternoon Tea
13:30 - 14:45	15:30 - 16:45	Session 4: Supervised Learning
14:45 - 15:00	16:45 - 17:00	Closeout

Miscellaneous links

Additional information pertaining to chat based discussions and material within the workshop:

Centre for Transforming Maintenance Through Data Science (CTMTDS): https://www.maintenance.org.au/
CTMTDS - Theme 1 Support the Maintainer (Wei & Tyler; NLP): https://www.maintenance.org.au/category/rt1
Industrial Ontologies - Maintenance Working Group: https://www.industrialontologies.org/?page_id=92
Aquila exploratory data analysis tool: http://agent.csse.uwa.edu.au/aquila
Spacy - Industrial Strength Natural Language Processing: https://spacy.io/
Gensim - Topic Modelling for Humans: https://radimrehurek.com/gensim/
NLTK - Natural Language Tool Kit: https://www.nltk.org/
Interactive word2vec (embedding) visualisation tool: https://ronxin.github.io/wevi/
PyTorch - Binary Cross Entropy Loss (BCELoss): https://pytorch.org/docs/stable/nn.html#bceloss
PyTorch - Recurrent Neural Network (RNN) module: https://pytorch.org/docs/stable/nn.html#rnn
CUDA framework for GPU training: https://developer.nvidia.com/cudnn
CUDA supported GPUs: https://developer.nvidia.com/cuda-gpus
Automatic Summarization (NLP/NLG): https://en.wikipedia.org/wiki/Automatic_summarization
Industrial Ontologies - Maintenance Working Group: https://www.industrialontologies.org/?page_id=92
Example of embeddings drawing powerful insights into COVID19 research: https://www.kaggle.com/tarunpaparaju/covid-19-dataset-gaining-actionable-insights

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
handouts		handouts
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
apt.txt		apt.txt
environment.yml		environment.yml
postBuild		postBuild

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

handouts

handouts

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

apt.txt

apt.txt

environment.yml

environment.yml

postBuild

postBuild

Repository files navigation

CORE Skills Data Science Springboard - Day 12 - Special Data Types: Natural Language Processing

Overview

Aims

Schedule

Miscellaneous links

About

Releases

Packages

Languages

License

morganjwilliams/12-text-processing

Folders and files

Latest commit

History

Repository files navigation

CORE Skills Data Science Springboard - Day 12 - Special Data Types: Natural Language Processing

Overview

Aims

Schedule

Miscellaneous links

About

Resources

License

Stars

Watchers

Forks

Languages