Understanding link prediction for DistMult/ComplEx #2015

davidshumway · 2021-12-27T18:43:53Z

davidshumway
Dec 27, 2021

Given a train and test set of nodes/relations, e.g.

train:

observation-1 hasTemp temp_32
observation-1 hasLatInteger lat_42
observation-1 hasLonInteger lon_88
...

test:

observation-300000 hasTemp temp_22
observation-300000 hasLatInteger lat_42
observation-300000 hasLonInteger lon_88

and training:

Epoch 1/10
2128/2128 [==============================] - 65s 30ms/step - loss: 0.7432 - binary_accuracy: 0.5002 - val_loss: 0.7492 - val_binary_accuracy: 0.5004
Epoch 2/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.7212 - binary_accuracy: 0.5002 - val_loss: 0.7081 - val_binary_accuracy: 0.5002
Epoch 3/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.7009 - binary_accuracy: 0.5003 - val_loss: 0.6977 - val_binary_accuracy: 0.4999
Epoch 4/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.7013 - binary_accuracy: 0.5001 - val_loss: 0.7012 - val_binary_accuracy: 0.5002
Epoch 5/10
2128/2128 [==============================] - 64s 30ms/step - loss: 0.6942 - binary_accuracy: 0.5002 - val_loss: 0.6931 - val_binary_accuracy: 0.5005
Epoch 6/10
2128/2128 [==============================] - 64s 30ms/step - loss: 0.6932 - binary_accuracy: 0.5013 - val_loss: 0.6931 - val_binary_accuracy: 0.5041
Epoch 7/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.6932 - binary_accuracy: 0.5464 - val_loss: 0.6931 - val_binary_accuracy: 0.6383
Epoch 8/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.6932 - binary_accuracy: 0.6630 - val_loss: 0.6931 - val_binary_accuracy: 0.6667
Epoch 9/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.6932 - binary_accuracy: 0.6667 - val_loss: 0.6931 - val_binary_accuracy: 0.6667
Epoch 10/10
2128/2128 [==============================] - 63s 30ms/step - loss: 0.6932 - binary_accuracy: 0.6667 - val_loss: 0.6931 - val_binary_accuracy: 0.6667

with node embedding:

(n_components=2, perplexity=50, learning_rate='auto', n_iter=250, n_iter_without_progress=50, random_state=99, angle=0.6, n_jobs=-1)

then the output of DistMult is mrr and hits@10 for raw and filtered, e.g.

          mrr         hits at 10
filtered  0.04731     0.19375

and as well there is a list of raws and filtereds for each SRO in test resulting from rank_edges_against_all_nodes, e.g. filtereds:

[[  91781    321]
 [     57    14402]
 [      7    249]
 [     80    579]
 [     11    3640]
 [     48    47587] ... ]

with [... [object_rank subject_rank] ...] for each SRO in test?

Something like this:

[[  91781    321] -> embedding's prediction for test[0]'s O/S is 91781 out of (___) and 321 out of (___)?
 [     57    14402] -> embedding's prediction for test[1]'s O/S is 57 out of (___) and 14402 out of (___)?

? For filtereds, this ranking is against negatively generated triples?

I'd like to further understand: why do some relations appear to be predicted very well while others are very poorly predicted; why in some cases do objects appear to be better predicted than subjects, and vice versa in other cases. But a little unsure where/how to start exploring in the generated embedding. Any tips to doing so?

Going further, it appears there is an embedding for every triple in the dataset. Why is this? This is opposed to what I would think would occur, which is to have an embedding for every node in the graph.

Using the above example, assuming there are 300,000 nodes in the graph, and example triples for each node are as follows:

observation-300000 hasTemp temp_22
observation-300000 hasLatInteger lat_42
observation-300000 hasLonInteger lon_88

then generating the node embedding:

embedding_model = wn18_model
node_gen = KGTripleGenerator(g, batch_size=10000).flow(samples_test)
node_embeddings = embedding_model.predict(node_gen, workers=4, verbose=1)

then len(node_embeddings) is 900,000 (300,000 nodes x 3 relations per node) rather than 300,000 (one embedding per node).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding link prediction for DistMult/ComplEx #2015

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Understanding link prediction for DistMult/ComplEx #2015

davidshumway Dec 27, 2021

Replies: 0 comments

davidshumway
Dec 27, 2021