You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know this is a long shot, but I've been stuck with this problem for a while now and still found no answer. Let's see if any of you can figure it out
Problem
I have a Graphframes graph, from which I've obtained the connected components. Now, I would like to find the distance from a source node to a target node, both pertaining to the same component.
id_src
id_dst
component
123
657
1
234
876
2
876
567
2
I would like to calculate the distance from id_src to id_dst for each row in this DataFrame, so the result would look like:
id_src
id_dst
component
distance
123
657
1
4
234
876
2
2
876
567
2
2
I know I need to use the BFS function from Graphframes, but can't find the way to make it parallel and provide the source and destination id for each row.
This results in the following exception, I understand it's because I can't parallelize an already parallel function:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 476, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 72, in dumps
cp.dump(obj)
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.RLock' object
I also thought about using a non-parallel library like networkx or igraph, creating a single graph from each connected component. The problem is I don't know how to generate these single graphs and then reference them from the udf.
Any ideas are appreciated, thank you
The text was updated successfully, but these errors were encountered:
I know this is a long shot, but I've been stuck with this problem for a while now and still found no answer. Let's see if any of you can figure it out
Problem
I have a Graphframes graph, from which I've obtained the connected components. Now, I would like to find the distance from a source node to a target node, both pertaining to the same component.
I would like to calculate the distance from
id_src
toid_dst
for each row in this DataFrame, so the result would look like:I know I need to use the BFS function from Graphframes, but can't find the way to make it parallel and provide the source and destination id for each row.
What I've tried
This results in the following exception, I understand it's because I can't parallelize an already parallel function:
Any ideas are appreciated, thank you
The text was updated successfully, but these errors were encountered: