Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Summingbird batch doesn't work with the latest Scalding develop #760

Open
fwbrasil opened this issue Feb 16, 2018 · 10 comments
Open

Summingbird batch doesn't work with the latest Scalding develop #760

fwbrasil opened this issue Feb 16, 2018 · 10 comments

Comments

@fwbrasil
Copy link

ScaldingPlatform.run doesn't trigger execution anymore since the flow creation just mutates FlowStateMap to be later used by the specific backend. CascadingBackend isn't even called and summingbird produces following warning message:

18/02/16 18:09:41 WARN scalding.Scalding: No Sinks were planned into flows. Waiting state is probably out of sync with stores. Proceeding with NO-OP.

Even though twitter/scalding#1794 is a separate bug, it was surfaced because of the summingbird job doesn't produce data.

@fwbrasil
Copy link
Author

cc/ @johnynek @ttim

@fwbrasil
Copy link
Author

I wonder what's the best way to address this issue. Maybe we should reimplement ScaldingPlatform.run based on Execution?

@johnynek
Copy link
Collaborator

comment made on gitter.

I didn't see the thing you mentioned yesterday @fwbrasil -- you need to make a change, not just bump the dep.
We made that change at Stripe when we tried, but I can't do that on the public repo until we publish a public milestone.
I'll publish 0.18.0-RC1 soon and then make a branch of summingbird to use it.

@fwbrasil
Copy link
Author

@johnynek could you share the patch so I can unblock our tests?

@ttim
Copy link
Collaborator

ttim commented Feb 16, 2018

I think it's impossible to change API of scalding in a way to not break this portion of Summingbird?

In this case is it possible to fix Summingbird in a way compatible with previous version of scalding too?
What I would like to achieve is to isolate this change in Summingbird and therefore make it possible to firstly release Summingbird in our repo and then release Scalding without touching Summingbird?

@johnynek
Copy link
Collaborator

@ttim you could do some terrible reflection hack, but since summingbird does not use the Execution API natively, it reaches pretty into the cascading part. There are some subtleties here.

Currently we check in the flowDef in summingbird if the flowDef is empty, but in scalding 0.18 we won't modify the flowDef actually until we plan, so that check is invalid.

On the other hand, cascading for some reason throws if you plan an empty Flow, which is valid in summingbird if there is no work to do.

@johnynek
Copy link
Collaborator

here is a real diff:

stripe-archive@d0e156e

tests pass on that branch

@fwbrasil
Copy link
Author

@johnynek thank you for sharing the patch!

since summingbird does not use the Execution API natively

Is there a specific reason for this design decision? Or is it just tech debt? It seems that migrating this code to use Execution would be a better solution.

@johnynek
Copy link
Collaborator

Techdebt.

I tried the migration internally. It is basically a total rewrite. We spent a few days on it and couldn't get the tests to pass yet so we set it aside.

I hope twitter does have the time to tackle this. Happy to code review it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants