Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fast merge in the storage plugin interface. #60

Open
jealous opened this issue Oct 9, 2019 · 0 comments
Open

Support fast merge in the storage plugin interface. #60

jealous opened this issue Oct 9, 2019 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@jealous
Copy link
Collaborator

jealous commented Oct 9, 2019

Allow the plugin developer to implement a fast merge which will be invoked when:

  • Encryption is disabled.
  • Compression is disabled or the compression codec supports concatenation of serialized streams.
  • The spark.shuffle.unsafe.fastMergeEnabled option is true.
  • The plugin supports fast merge.

Note the performance of this fast merge function could seriously impact the performance of Spark SQL joins with multiple spills.

@jealous jealous added the enhancement New feature or request label Oct 9, 2019
@jealous jealous self-assigned this Oct 9, 2019
jealous added a commit that referenced this issue Oct 9, 2019
Allow the plugin developer to implement a fast merge which will be
invoked when:
* Encryption is disabled.
* Compression is disabled or the compression codec supports
  oncatenation of serialized streams.
* The `spark.shuffle.unsafe.fastMergeEnabled` option is true.
* The plugin supports fast merge.

Note the performance of this fast merge function could seriously impact
the performance of Spark SQL joins with multiple spills.

The sample implementation of fast merge used in the shared file system
plugin is migrated from the original Spark's
`UnsafeShuffleWriter.mergeSpillsWithTransfer` function.  Other storage
plugins could implement their own fast merge.

The new fast merge API is by default disabled by the plugin.
jealous added a commit that referenced this issue Oct 9, 2019
Allow the plugin developer to implement a fast merge which will be
invoked when:
* Encryption is disabled.
* Compression is disabled or the compression codec supports
  oncatenation of serialized streams.
* The `spark.shuffle.unsafe.fastMergeEnabled` option is true.
* The plugin supports fast merge.

Note the performance of this fast merge function could seriously impact
the performance of Spark SQL joins with multiple spills.

The sample implementation of fast merge used in the shared file system
plugin is migrated from the original Spark's
`UnsafeShuffleWriter.mergeSpillsWithTransfer` function.  Other storage
plugins could implement their own fast merge.

The new fast merge API is by default disabled by the plugin.
jealous added a commit that referenced this issue Oct 11, 2019
Allow the plugin developer to implement a fast merge which will be
invoked when:
* Encryption is disabled.
* Compression is disabled or the compression codec supports
  oncatenation of serialized streams.
* The `spark.shuffle.unsafe.fastMergeEnabled` option is true.
* The plugin supports fast merge.

Note the performance of this fast merge function could seriously impact
the performance of Spark SQL joins with multiple spills.

The sample implementation of fast merge used in the shared file system
plugin is migrated from the original Spark's
`UnsafeShuffleWriter.mergeSpillsWithTransfer` function.  Other storage
plugins could implement their own fast merge.

The new fast merge API is by default disabled by the plugin.
jealous added a commit that referenced this issue Nov 11, 2019
Allow the plugin developer to implement a fast merge which will be
invoked when:
* Encryption is disabled.
* Compression is disabled or the compression codec supports
  oncatenation of serialized streams.
* The `spark.shuffle.unsafe.fastMergeEnabled` option is true.
* The plugin supports fast merge.

Note the performance of this fast merge function could seriously impact
the performance of Spark SQL joins with multiple spills.

The sample implementation of fast merge used in the shared file system
plugin is migrated from the original Spark's
`UnsafeShuffleWriter.mergeSpillsWithTransfer` function.  Other storage
plugins could implement their own fast merge.

The new fast merge API is by default disabled by the plugin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant