Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[paper] definition of M vs Fig. 2. #51

Open
olszewskip opened this issue Apr 12, 2022 · 1 comment
Open

[paper] definition of M vs Fig. 2. #51

olszewskip opened this issue Apr 12, 2022 · 1 comment

Comments

@olszewskip
Copy link

Hi!
Apologies if this is not the best place to ask this but. I've been reading Your paper and I hope You could clarify this little thing which is confusing to me:

  1. Section III.B offers the definition of matrix M, M_ab = e_ab/deg(v_a).
  2. Figure 2. presents toy example that includes explicit M matrices.

Are those latter matrices in 2. meant to follow the definition from 1.?
If so, I must be misinterpreting something about this definition. Based on it I was expecting the rows of Ms in Fig. 2. to be normalized. Can I ask for example what is the value of deg(v_a) for the third row (i.e. v_a = p2_hash, if I understand correctly) of M_3 in Fig. 2. It seems that the entries in this row were constructed as follows:

  • taking v_b=u1_hash, we have e_ab=2 and deg(v_a)=4, and the entry in M_3 reads 1/2;
  • and taking v_b=u2_hash, we have e_ab=1 and deg(v_a)=3, and the entry in M_3 reads 1/3;
    so it seems that deg(v_a) is not just a function of v_a, which - for me - is not captured in the presentation in Section III.

What am I missing? Thanks in advance for any hints! :)

@barbara3430
Copy link

Dear @olszewskip !
First of all, sorry for the delay. If the issue is urgent and we forget to reply, please feel free to ping us :)

As to your question:
Just to be sure, I'll mention that Algorithm 1 in Section III and Figure 2 refer to different cases. Algorithm 1 refers to the case where we chunk the graph (that is, we cut the graph into N parts and then merge the resulting embeddings), but we only have one node type in the whole graph. This is not the same case as displayed in Figure 2 - there we handle multiple node (relation) types - users, products, brands - within one graph. Indeed, multiple matrices M are created but they are not graph chunks. They exist because technically Cleora handles 2D matrices with pairs of relations, and with more relations multiple such matrices must be created.

Now, to the issue with normalization - in Figure 2 each M matrix represents 2 node types, in such a case we decide to calculate transition matrix for transitions from node type 1 to node type 2 only. In case M_3 matrix, let's consider graph traversal always starting from the user nodes (u1_hash, u2_hash) towards product nodes (p1_hash, p2_hash, p3_hash). You will notice that values are indeed normalized by the number of edges running from each user node (user node being the "a" node in in M_ab = e_ab/deg(v_a) formula). If we had only 1 node type, all would be just like in Algorithm 1 and each row would be normalized. Figure 2 was in fact meant to show some complex non-standard use case were we could better display technical issues such as parallelism in computation. However, probably we should make this more clear and thanks for noticing.

Hope this helps!
Barbara

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants