Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于partitioner的疑问 #51

Open
leo-987 opened this issue Jun 23, 2016 · 0 comments
Open

关于partitioner的疑问 #51

leo-987 opened this issue Jun 23, 2016 · 0 comments

Comments

@leo-987
Copy link

leo-987 commented Jun 23, 2016

我在 Learning Spark 中看到有一段话:

Finally, for binary operations, which partitioner is set on the output depends on the parent RDDs’ partitioners. By default, it is a hash partitioner, with the number of partitions set to the level of parallelism of the operation. However, if one of the parents has a partitioner set, it will be that partitioner; and if both parents have a partitioner set, it will be the partitioner of the first parent.

子RDD的partitioner应该由父RDD的partitioner决定。但在 SparkInternals 的第二章,父子RDD的partitioner都不相同,这是怎么回事?如果两个父RDD的其中一个是hash-partitioner,那么子RDD不应该也是hash-partitioner吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant