Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the definition of dependencies is different from RDD paper? #78

Open
endersuu opened this issue Oct 9, 2022 · 0 comments
Open

Why the definition of dependencies is different from RDD paper? #78

endersuu opened this issue Oct 9, 2022 · 0 comments

Comments

@endersuu
Copy link

endersuu commented Oct 9, 2022

From the paper Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

  • narrow dependencies, where each partition of the parent RDD is used by at most one partition of the child RDD
  • wide dependencies, where multiple child partitions may depend on it

However, The definition of dependencies from the chapter JobLogicalPlan is different :

  • NarrowDependency, Each partition of the child RDD fully depends on a small number of partitions of its parent RDD. Fully depends (i.e., FullDependency) means that a child partition depends the entire parent partition.

  • ShuffleDependency, Multiple child partitions partially depends on a parent partition. Partially depends (i.e., PartialDependency) means that each child partition depends a part of the parent partition.

This makes me really confused. Are ShuffleDependency and wide dependency the same thing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant