New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Trying cascading3 #1446
Conversation
@@ -101,15 +103,21 @@ trait Config extends Serializable { | |||
def setMapSpillThreshold(count: Int): Config = | |||
this + (SpillableProps.MAP_THRESHOLD -> count.toString) | |||
|
|||
@deprecated("deprecated in Cascading 2.7 and dropped in Cascading 3.0, use setMapSideAggregationCapacity", "cascading 2.7") | |||
def setMapSideAggregationThreshold(count: Int): Config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really correct for scalding, the comment mentions the method is realted to the associated operations which wires down to https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/Operations.scala#L126 which uses the same key as cascading. Not sure we shouldn't just be keeping the method/key as it was just defining the constant in scalding instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @ianoc said. I'd rather us take the max value of the two keys in Operations.scala, and here set the non-deprecated internally, but not deprecate our API (instead explain what it means to the user, not necessarily in cascading terms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call the capacity variant?
… Right now, busy updating quite a few dependencies to Cascading3 (cascading-parquet, cascading-avro, elephant-bird-cascading3, etc. |
@ianoc @johnynek Done for now. Trouble is, this requires that code from a few dependencies' pull requests be available at compile time, and I'm not quite sure how best to convince Travis to do this:
For that reason, the code currently fails to run on Travis. |
Now working off binaries dumped onto conjars.org for patched dependencies. There is an outstanding problem at test stage: some Test Jobs seem to deadlock, apparently within the Cascading test driver. Investigating this week (hopefully). Already tried to run the tests single core, no behaviour change. |
df99027
to
f733ef0
Compare
synced with 0.15.1-RC1+ at 55f03f7 |
The deadlock in SkewJoinPipeTest should hopefully be fixed by cwensel/cascading#47 (unless that one breaks more things elsewhere ☺). |
f733ef0
to
9e77eab
Compare
Update: some tests fixed with respect to .WithDescription("hi there") cc @simondumas |
|
||
val printDependencyClasspath = taskKey[Unit]("Prints location of the dependencies") | ||
|
||
val sharedSettings = Project.defaultSettings ++ assemblySettings ++ scalariformSettings ++ Seq( | ||
organization := "com.twitter", | ||
|
||
scalaVersion := "2.10.6", | ||
scalaVersion := "2.10.5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reasons for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, oversight. reverting.
fc006e0
to
d33b76d
Compare
…teSource. (hoping that's right)
dfs-datastores' semantics have shifted since 1.3.4, and now it is of the consumer's responsibility to create the empty version directory which will contain the version's data. In this whitebox test, nobody was creating the empty directory anymore, so the test failed. Added a "hint" and an explanation (not sure this tests much now, but keeps the original dance mostly in)
…operties but responds in the current
+ cancel accidental downgrade of scala 2.10
… Alternative tests to resolve this
… to the old or new API. Work around that.
…assumption (which is not upheld in Tez)
… cascading-hadoop)
6a5a264
to
eb7546f
Compare
I think #1586 may actually be getting close to merge... I'm closing this in favor of that. If that is not right. Reopen. Thanks a ton @cchepelov for kicking this all off. I'm sorry it has taken so long. |
You're most welcome. I'm relieved and excited this moves forwards.
Will check what's in #1586 and what might be missing and'll forward port
what I need to switch. Glad to be soon out of the corner I'm painted in ;)
Le 25 janvier 2018 7:52:04 PM "P. Oscar Boykin" <notifications@github.com>
a écrit :
… I think #1586 may actually be getting close to merge... I'm closing this in
favor of that.
If that is not right. Reopen. Thanks a ton @cchepelov for kicking this all
off. I'm sorry it has taken so long.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1446 (comment)
|
A quick-n-dirty attempt at using Cascading3 in scalding, at the source level.
At this point, this will choke quickly based on a few API that either disappeared or moved within Cascading; specifically, FlowStep#getSources and FlowStep#getGraph.
Really only a quick RFC at this point, will need to investigate how to best to support the "missing" APIs
(suspecting this is really a matter of adapting to the new API made just to help building ExecutionContext.getDesc in a nicer way)