GitHub - arunarn2/GeoLocationTagging: Text-based Geolocation Prediction of Social Media Users with Neural Networks

This repository contains the models for "Text-based Geolocation Prediction of Social Media Users with Neural Networks", IEEE BigData 2017 https://ieeexplore.ieee.org/document/8257985/

Dataset:

For this project I am using the CMU Geo-tagged dataset

Implemented Geo-tagging as a classification task (output is 1 out of the 49 US States) and a regression task (Latitude/Longitude is predicted from the regression).

Models:

Text CNN
Implementation of Convolutional Neural Networks for Sentence Classification
Structure:
Embedding --> Convolutional --> Max Pooling---> FC layer --> Softmax
Text RNN
Implementation based on model from Emojifier-v2
Structure:
Embedding --> Bi-directional LSTM --> Dropout --> Concat ouput --> LSTM --> Droput --> FC layer --> Softmax

Text RCNN
Implementation of Recurrent Convolutional Neural Network for Text Classification

Structure:
Recurrent structure (convolutional layer) --> Max Pooling --> FC Layer --> Softmax

Learns representation of each word in the sentence or document with a left side context and a right side context.
representation current word=[ left_side_context_vector, current_word_embedding, right_side_context_vector ].
Uses a recurrent structure for the left side context; non-linear transformation for the previous word and a left side previous context.
FastText
Implmentation of Bag of Tricks for Efficient Text Classification

Structure:
After embedding each word in the sentence, the word representations are then averaged into a text representation, which is in turn fed to a linear classifier. Uses softmax function to compute the probability distribution over the predefined classes and a cross entropy is used to compute loss. Bag of words representation does not consider word order. Hence in order to take into account the word order, n-gram features are used to capture some partial information about the local word order; when the number of classes is large, computing the linear classifier is computationally expensive; hence it hierarchical softmax to speed up the training process.
- uses bi-gram and/or tri-gram
- uses NCE loss to speed us softmax computation
HierarchicalWithAttention
Implementation of Hierarchical Attention Networks for Document Classification

Structure:
i) Embedding
ii) Word Encoder: word level bi-directional GRU to get rich representation of words
iii) Word Attention:word level attention to get important information in a sentence
iv) Sentence Encoder: sentence level bi-directional GRU to get rich representation of sentences
v) Sentence Attetion: sentence level attention to get important sentence among sentences
vi) FC+Softmax
BiLSTMTextRelation:
Implementation based on Dual LSTM Encoder model from The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Structure:
Embedding --> Bi-directional LSTM --> Dropout --> Concat ouput --> LSTM --> Droput --> FC layer --> Softmax
Seq2SeqAttn:
Implementation seq2seq with attention derived from NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

Structure:
Embedding --> Bi-directional GRU --> Decoder with attention

Input Data:
There are two kinds of three kinds of inputs:1)encoder inputs (a sentence), 2)decoder inputs(labels list with fixed length; 3)target labels, it is also a list of labels.
For example, labels is:"L1 L2 L3 L4", then decoder inputs will be:[_GO,L1,L2,L2,L3,_PAD]; target label will be:[L1,L2,L3,L3,_END,_PAD]. length is fixed to 6, any exceed labels will be trancated, will pad if label is not enough to fill.

Attention Mechanism: i) Transfer encoder input list and hidden state of decoder
ii) Calculate similiarity of hidden state with each encoder input, to get possibility distribution for each encoder input.
iii) Compute weighted sum of encoder input based on possibility distribution.
iv) Go though RNN Cell using this weight sum together with decoder input to get new hidden state

How Vanilla Encoder Decoder Works: The source sentence will be encoded using RNN as fixed size vector ("thought vector").

During training, another RNN will be used to try to get a word by using this "thought vector" as init state, and take input from decoder input at each timestamp. decoder start from special token "_GO". after one step is performanced, new hidden state will be get and together with new input, we can continue this process until we reach to a special token "_END". we can calculate loss by compute cross entropy loss of logits and target label. logits is get through a projection layer for the hidden state(for output of decoder step(in GRU we can just use hidden states from decoder as output).

During testing, there is no label. so we should feed the output we get from previous timestamp, and continue the process util we reached "_END" TOKEN.
CNNWithAttn:
Implementation based on Neural Relation Extraction with Selective Attention over Instances

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
cache_vocabulary_label_pik		cache_vocabulary_label_pik
datasets/cmu		datasets/cmu
models		models
HwithAtnn.JPG		HwithAtnn.JPG
README.md		README.md
TextCNNOutput.txt		TextCNNOutput.txt
config.py		config.py
data.py		data.py
eisenstein_locations.csv		eisenstein_locations.csv
embeddings.py		embeddings.py
loss.py		loss.py
main.py		main.py
rnn.png		rnn.png
seq2seqAttention.JPG		seq2seqAttention.JPG
text-based-geolocation.pdf		text-based-geolocation.pdf
utils.py		utils.py

arunarn2/GeoLocationTagging

Folders and files

Latest commit

History

Repository files navigation

Dataset:

Models:

Usage:

Envionment:

Reference:

About

Topics

Resources

Stars

Watchers

Forks

Languages