Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node2Vec algorithm results are not reliable #2076

Open
cuneyttyler opened this issue Jan 19, 2023 · 0 comments
Open

Node2Vec algorithm results are not reliable #2076

cuneyttyler opened this issue Jan 19, 2023 · 0 comments
Labels
bug Something isn't working sg-library

Comments

@cuneyttyler
Copy link

cuneyttyler commented Jan 19, 2023

I am trying to test how node2vec works with a small graph here : https://ibb.co/Y7JWVTV

I calculated the cosine similarity between nodes 0,1 and 10. I'd expect similarity between 0 and 1 would be high and others would be low. However similarities are (-0.0070848754, 0.062368274, -0.13235472) for (0-1),(0-10),(1-10) pairs. Is it not reasonable to expect cosine similarities to be higher for close and interconnected nodes? If so, how can we measure the similarity and test the embeddings?

Thanks.

Here is my code:

graph_embedding_size = 100

def get_embedding(graph):
    edges_ = pd.DataFrame({
            'source': [e['source'] for e in graph['edges']],
            'target': [e['target'] for e in graph['edges']],
            'type': graph['edge_types']
        })

    G = StellarGraph(IndexedArray(index=graph['nodes']), edges_, edge_type_column="type")

    walk_length = 10
    rw = BiasedRandomWalk(G)
    walks = rw.run(
        nodes=G.nodes(),  # root nodes
        length=walk_length,  # maximum length of a random walk
        n=2,  # number of random walks per root node
        p=0.5,  # Defines (unormalised) probability, 1/p, of returning to source node
        q=2.0,  # Defines (unormalised) probability, 1/q, for moving away from source node
        weighted=False,  # for weighted random walks
        seed=42,  # random seed fixed for reproducibility
    )

    model = Word2Vec(
        walks,  vector_size=graph_embedding_size, window=5, min_count=0, sg=1, workers=1
    )

    return model.wv.vectors

graph_ex = {'nodes': [0,1,2,3,4,5,6,7,8,9,10], 
            'edges': [
                { 'source': 0,'target': 1},
                {'source': 0,'target': 2},
                {'source': 0,'target': 3},
                {'source': 1,'target': 2},
                {'source': 1,'target': 3},
                {'source': 2,'target': 3},
                {'source': 2,'target': 4},
                {'source': 4,'target': 5},
                {'source': 4,'target': 6},
                {'source': 5,'target': 6},
                {'source': 6,'target': 7},
                {'source': 5,'target': 8},
                {'source': 5,'target': 9},
                {'source': 9,'target': 10}
            ], 'edge_types': [1,1,1,1,1,1,1,1,1,1,1,1,1,1]}

embeddings_ex = get_embedding(graph_ex)

from numpy.linalg import norm

def cosine_sim(A,B):
    return np.dot(A,B)/(norm(A)*norm(B))

A = embeddings_ex[0]
B = embeddings_ex[1]
C = embeddings_ex[10]
cosine_sim(A,B),cosine_sim(A,C),cosine_sim(B,C)
@cuneyttyler cuneyttyler added bug Something isn't working sg-library labels Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sg-library
Projects
None yet
Development

No branches or pull requests

1 participant