Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectes components produce non correct results #428

Open
i100van opened this issue Mar 7, 2023 · 2 comments
Open

Connectes components produce non correct results #428

i100van opened this issue Mar 7, 2023 · 2 comments

Comments

@i100van
Copy link

i100van commented Mar 7, 2023

Hello good evening,

I am suffering from the same problem, after generating the connected components, a component is generated that attaches many nodes that are unconnected. I have tried the options mentioned in the thread (collapse the spark graph using the round trip, make a previous count, use a repartition) but nothing works, I keep receiving as output a component that agglutinates millions of nodes, and whose number is usually 0 but oscillates.

I have no more ideas to resort to, I don't know if anyone has any idea that could help, it would be of great help.

Thank you very much for your help.

@mattjamison
Copy link

Fwiw, I've noticed this behavior when AQE is enabled. If I disable it directly before calling connected components, it works as expected in my case ...

spark.conf.set('spark.sql.adaptive.enabled', 'false')

@pgrandjean
Copy link

pgrandjean commented Aug 11, 2023

Same issue. Writing/Reading vertices and edges to disk worked in my case.

Another workaround: replace call to monotonically_increasing_id() by zipWithUniqueId. This means the DataFrame has to be converted to an RDD and back to a DataFrame again. It seems to work as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants