Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-2736] Adds output batch compression for Data Transforms #18514

Draft
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented May 15, 2024

This PR adds the ability to configure output batch compression on WASM transforms, either at deploy time or post facto, via metadata patch request. Includes rpk experience for same

TODO:

  • rpk bits and tests for deploying w/ compression turned on

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Add output batch compression for Data Transforms (configurable per deployed transform)

@oleiman oleiman self-assigned this May 15, 2024
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch 2 times, most recently from 4adf296 to 58df36c Compare May 16, 2024 05:45
@oleiman
Copy link
Member Author

oleiman commented May 16, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented May 16, 2024

/ci-repeat 1
release
skip-units
skip-redpanda-build

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 16, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/49248#018f82f5-d2fc-4bd5-b204-a1774afd79bf:

"rptest.tests.connection_virtualizing_test.TestVirtualConnections.test_handling_invalid_ids"

new failures in https://buildkite.com/redpanda/redpanda/builds/49248#018f82f5-d2ff-46d2-8840-d9e063a709e2:

"rptest.tests.connection_virtualizing_test.TestVirtualConnections.test_no_head_of_line_blocking.different_clusters=False.different_connections=False"

"zstd"
],
"required": false,
"description": "Output batch compression mode (TODO(oren): might want to limit which?)"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we want/need to make all of these available.

@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch 2 times, most recently from d9f6691 to f13207f Compare May 17, 2024 18:31
@oleiman oleiman marked this pull request as ready for review May 17, 2024 18:32
@oleiman oleiman requested review from a team and michael-redpanda and removed request for a team May 17, 2024 22:55
@@ -149,6 +153,7 @@ The --var flag can be repeated to specify multiple variables like so:
cmd.Flags().StringSliceVarP(&fc.outputTopics, "output-topic", "o", []string{}, "The output topic to write the transform results to (repeatable)")
cmd.Flags().StringVar(&fc.functionName, "name", "", "The name of the transform")
cmd.Flags().Var(&fc.env, "var", "Specify an environment variable in the form of KEY=VALUE")
cmd.Flags().StringVar(&fc.compression, "compression", "", "Output batch compression type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be an explicit "none"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair point

NAME INPUT TOPIC OUTPUT TOPIC RUNNING LAG
foo2bar foo bar 2 / 3 7
scrubber pii cleaned, munged 0 / 1 99
NAME INPUT TOPIC OUTPUT TOPIC RUNNING LAG COMPRESSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful? We don't print env vars and this feels like that configuration. Maybe in the detail view? Doesn't feel like there is a user story where someone wants this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, yeah I suppose not.

@@ -81,3 +81,40 @@ Subsequent 'rpk transform list' operations will show transform processors as
}
return cmd
}

func newSetCompressionCommand(fs afero.Fs, p *config.Params) *cobra.Command {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably confusing for users that this is not sticky across deploys

Comment on lines +102 to +103
at a cost. So, while it does occur asynchronously with respect to transform
execution, compression may introduce latency on the output topic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
at a cost. So, while it does occur asynchronously with respect to transform
execution, compression may introduce latency on the output topic.
at a cost. So, while it occurs asynchronously with respect to transform
execution, compression may introduce latency on the output topic.

@oleiman oleiman marked this pull request as draft May 31, 2024 19:56
boost::lexical_cast<compression>("bogus) should raise bad_lexical cast.

Previous behavior: runtime_error("Fell off the end of a string-switch")

Failure to match the source string to one of the compression lexical cases
should set the fail bit on the istream instead of allowing the runtime_error
to escape.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Also adds bool transform_metadata_patch::empty()

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Respecting transform_metadata::compression_mode

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- tranform_from_json
- deploy
- set_compression

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch from f13207f to 1e48f71 Compare May 31, 2024 20:07
@oleiman
Copy link
Member Author

oleiman commented May 31, 2024

force push to sync w/ dev after a couple weeks of inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants