-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 4.0 / DBR 14.2+ - bleeding edge changes #787
Comments
Hmm, swappable jar? I'm super open to add any neccessary compat layers; shoot a PR and we'll get it in if you have any nice ideas! |
fyi - I've created shim to handle the abstraction |
fyi - With the 1st shim snapshot push, compiling against 3.5.oss works when running against 14.3.dbr, only StaticInvoke needed doing. (so the same frameless jar can be run against both 14.0/14.1 and 14.2/14.3 by swapping the shim to the right dbr version. or indeed users stay with the oss version as per a normal dependency) The code for StaticInvoke handling and shims etc. is branch here and diff here I'll target the major pain points impacted in each OSS major/minor release next (i.e. TypedEncoder, TypedExpressionEncoder and RecordEncoder) to have each internal api usage pulled out (e.g. [Un]WrapOption, Invoke, NewInstance, GetStructField, ifisnull, GetColumnByOrdinal, MapObjects and probably TypedExpressionEncoder itself). It's probably worth doing them in advance of any pull request. What I'll attempt with this is to see how much of the encoding logic can be re-used from the current frameless codebase and targetted major versions on older dbrs (e.g. can we get a 3.5 oss frameless jar running on a 3.1.2 Databricks runtime) If you'd like me to add FramelessInternals.objectTypeFor, ScalaReflection.dataTypeFor etc. as well I think that'd make sense but Reflection had been fairly stable code before they ripped it out :) |
…ifisnull, GetColumnByOrdinal, MapObjects and TypedExpressionEncoder shimmed
…ifisnull, GetColumnByOrdinal, MapObjects and TypedExpressionEncoder shimmed - attempt build
@pomadchin - I'd not want to advertise that it's possible to jump versions so much (there are other issues like kmeans and join interface changes of course) but it proves the approach works at least and may ease 4.x support. Pre-reformatting functional change diff is here. Key mima change is removal of frameless.MapGroups, it could of course be kept and just forwarding to a forward if needed. |
…teStruct, and allow shims for the deprecated functions
…se rc1, so1 not a default repo it seems
…oxy - deeply nested also possible
…oxy - deeply nested also possible
per b880261, #803 and #804 are confirmed as working on all LTS versions of Databricks, Spark 4 and the latest 15.0 runtime - test combinations are documented here |
A number of test issues appear when running on a cluster, these do not appear on a single node server (e.g. github runners, dev box or even Databricks Community Edition).
doubles lose precision on serialisation, e.g.:
the very last digit didn't match, as such all double gens have to be serializable, the same occurs for BigDecimals on other tests (like AggregateFunctionsTest first/last) but this is likely due to lack of the package arbitraries being correct in the testless shade (they are correct when used via TestlessSingle in the ide). for the order by: import frameless.{X2, X3}
import spark.implicits._
val v = Vector(X3(-1,false,X2(586394193,6313416569807298536L)), X3(2147483647,false,X2(1,-1L)), X3(729528245,false,X2(1,-1L)))
v.toDS.orderBy("c").collect().toVector the error that can occur is:
i.e. (1,-1) can be in any order and both are acceptable results. The test needs to be re-written to account for this to just compare c's. |
…clusters - inifinity protection
…0 databricks doesn't process them on ordered dataset
…0 databricks doesn't process them on ordered dataset
…0 databricks doesn't process them on ordered dataset
Per my comment on #755 DBR 14.2, 4.0 and likely all later versions includes SPARK-44913 StaticInvoke changes.
Whilst this hasn't yet been backported to 3.5 branch it could well end up there.
I'm happy to fork and publish a bleeding edge / non-standard frameless if needed but I also wonder if a compat layer as a separate swappable jar is the best route similar to #300 for example.
What is the collective preferred route to fixing / working around this?
The text was updated successfully, but these errors were encountered: