Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [GAE] The triangles algorithm results cannot be consistent for the same data set #3779

Open
dzhiwei opened this issue May 9, 2024 · 1 comment
Assignees

Comments

@dzhiwei
Copy link
Contributor

dzhiwei commented May 9, 2024

Describe the bug
The triangles algorithm results cannot be consistent for the same data set when workers number is changed.
I executed triangles algorithm with num_workers = 1, then improve num_workers to 2, the results are different.
Does triangles algorithm works wrong in multi workers environment?

To Reproduce
Steps to reproduce the behavior:
code:

import graphscope
from graphscope.framework.loader import Loader
import os

def execute_triangles(num_workers=1):
    sess = None
    curr_dir = os.path.abspath('.')
    dataset_name = "LDBC_SNB_REGRESSION"
    try:
        sess = graphscope.session(cluster_type="hosts", num_workers=num_workers, vineyard_shared_mem='10Gi')
        graph = sess.g()
        graph = graph.add_vertices(
            Loader(os.path.join(curr_dir, dataset_name, "person.csv"), delimiter=","),
            label="person"
            , properties=["_rank"]
        )
        graph = graph.add_edges(
            Loader(os.path.join(curr_dir, dataset_name, "person_knows_person.csv"), delimiter=","),
            label="knows", src_label="person", dst_label="person"
            , properties=["_rank"]
        )
        ctx = graphscope.triangles(graph)
        dataframe = ctx.to_dataframe(selector={"v":"v.id","r":"r"}).sort_values(by='r', ascending=False)
        print(dataframe)
        return dataframe
    finally:
        if sess:
            sess.close()
            
if __name__ == "__main__":
    execute_triangles(num_workers=1).to_csv("triangles1.csv")
    execute_triangles(num_workers=2).to_csv("triangles2.csv")

result:

# num_workers=1
                   v    r
451   32985348834375  720
1443   2199023256816  642
468    6597069767242  543
184             1564  239
# num_workers=2
                   v    r
986   32985348834375  719
346    2199023256816  642
617    6597069767242  540
479             1564  239

Expected behavior
The number for triangles for each vertex should be same although num_workers is different, please help clarify if this case is working fine.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):
(gs20_py10) (base) $ python3 --version
Python 3.10.11
(gs20_py10) (base) $ pip3 list|grep graphscope
graphscope 0.26.0
graphscope-client 0.26.0
(gs20_py10) (base) $ uname -a
Darwin f4d488816bd7 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000 arm64

Additional context
LDBC_SNB_REGRESSION.zip

Add any other context about the problem here.

@yecol
Copy link
Collaborator

yecol commented Jun 7, 2024

Thanks for reporting! We will investigate this and keep you updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants