Skip to content

An implementation of word2vec applied to [stanford philosophy encyclopedia](http://plato.stanford.edu/)

License

Notifications You must be signed in to change notification settings

mmourafiq/philo2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

philo2vec

A Tensorflow implementation of word2vec applied to stanford philosophy encyclopedia, the implementation supports both cbow and skip gram

for more reference, please have a look at this papers:

After training the model returns some interesting results, see interesting results part

Evaluating hume - empiricist + rationalist:

descartes
malebranche
spinoza
hobbes
herder

screen shot 2016-08-12 at 19 19 22

Some interesting results

Similarities

Similar words to death:

untimely
ravages
grief
torment

Similar words to god:

divine
De Providentia
christ
Hesiod

Similar words to love:

friendship
affection
christ
reverence

Similar words to life:

career
live
lifetime
community
society

Similar words to brain:

neurological
senile
nerve
nervous

operations

Evaluating hume - empiricist + rationalist:

descartes
malebranche
spinoza
hobbes
herder

Evaluating ethics - rational:

hiroshima

Evaluating ethic - reason:

inegalitarian
anti-naturalist
austere

Evaluating moral - rational:

commonsense

Evaluating life - death + love:

self-positing
friendship
care
harmony

Evaluating death + choice:

regret
agony
misfortune
impending

Evaluating god + human:

divine
inviolable
yahweh
god-like
man

Evaluating god + religion:

amida
torah
scripture
buddha
sokushinbutsu

Evaluating politic + moral:

rights-oriented
normative
ethics
integrity

The repo contains:

  • an object to crawl data from the philosophy encyclopedia; PlatoData
  • a object to build the vocabulary based on the crawled data; VocabBuilder
  • the model that computes the continuous distributed representations of words; Philo2Vec

Installation

The dependencies used for this module can be easily installed with pip:

> pip install -r requirements.txt

The params for the VocabBuilder:

  • min_frequency: the minimum frequency of the words to be used in the model.
  • size: the size of the data, the model then use the top size most frequenct words.

The hyperparams of the model:

  • optimizer: an instance of tensorflow Optimizer, such as GradientDescentOptimizer, AdagradOptimizer, or MomentumOptimizer.
  • model: the model to use to create the vectorized representation, possible values: CBOW, SKIP_GRAM.
  • loss_fct: the loss function used to calculate the error, possible values: SOFTMAX, NCE.
  • embedding_size: dimensionality of word embeddings.
  • neg_sample_size: number of negative samples for each positive sample
  • num_skips: numer of skips for a SKIP_GRAM model.
  • context_window: window size, this window is used to create the context for calculating the vector representations [ window target window ].

Quick usage:

params = {
    'model': Philo2Vec.CBOW,
    'loss_fct': Philo2Vec.NCE,
    'context_window': 5,
}
x_train = get_data()
validation_words = ['kant', 'descartes', 'human', 'natural']
x_validation = [StemmingLookup.stem(w) for w in validation_words]
vb = VocabBuilder(x_train, min_frequency=5)
pv = Philo2Vec(vb, **params)
pv.fit(epochs=30, validation_data=x_validation)
params = {
    'model': Philo2Vec.SKIP_GRAM,
    'loss_fct': Philo2Vec.SOFTMAX,
    'context_window': 2,
    'num_skips': 4,
    'neg_sample_size': 2,
}
x_train = get_data()
validation_words = ['kant', 'descartes', 'human', 'natural']
x_validation = [StemmingLookup.stem(w) for w in validation_words]
vb = VocabBuilder(x_train, min_frequency=5)
pv = Philo2Vec(vb, **params)
pv.fit(epochs=30, validation_data=x_validation)

about stemming

Since the words are stemmed as part of the preprocessing, some operation are sometimes necessary

StemmingLookup.stem('religious')  # returns "religi"

StemmingLookup.original_form('religi')  # returns "religion"

Getting similarities

pv.get_similar_words(['rationalist', 'empirist'])

Evaluating operations

pv.evaluate_operation('moral - rational')

plotting vectorized words

pv.plot(['hume', 'empiricist', 'descart', 'rationalist'])

Training details

skip_gram:

skip_gram_loss

skip_gram_embeddings

skip_gram_w

skip_gram_b

cbow:

cbow_loss

cbow_embedding

cbow_w

cbow_b