Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TypedEncoder for shapeless Record. #777

Open
tribbloid opened this issue Nov 21, 2023 · 3 comments
Open

Add TypedEncoder for shapeless Record. #777

tribbloid opened this issue Nov 21, 2023 · 3 comments

Comments

@tribbloid
Copy link

tribbloid commented Nov 21, 2023

Since RecordEncoder is already converting any product type into shapeless Record:

class RecordEncoder[F, G <: HList, H <: HList](
    implicit
    i0: LabelledGeneric.Aux[F, G],
    i1: DropUnitValues.Aux[G, H],
    i2: IsHCons[H],
    fields: Lazy[RecordEncoderFields[H]],
    newInstanceExprs: Lazy[NewInstanceExprs[G]],
    classTag: ClassTag[F])
    extends TypedEncoder[F] {
...

The only thing required is to break it into 2 stages, such that the intermediate HList/Record representation could serve as a more flexible type-level schema, it could even approximate the capability of the abandoned TypedDataFrame

I also realised that i0~i2 are not used in the function body. i2 is important to not accept HNil, but are i0 & i1 necessary?

@pomadchin
Copy link
Member

pomadchin commented Nov 22, 2023

Hey there; indeed in RecordEncoder those are not required. However these implicits are necessary for the TypedEncoder.usingDerivation function.

I don't think that these are bad here, at least work as a sanity check for us. But any improvement PRs are very much welcome.

Didn't quite follow a part about the shapeless.Record and two stages; usually the idea is to hide shapeless inside and not let it leak into the user API. But shoot a PR I'd be happy to help you to get it merged 👍

@tribbloid
Copy link
Author

@pomadchin voila, adding an experimental PR that adopts 2-stage RecordEncoder derivation.

The 1st stage is now also used for TypedRow[T <: HList], which can be seen as a successor of the abandoned TypedDataFrame. See the new test for an example usage:

0baf604#diff-dd83f3b1d1a249804b5620473177ce6034efbc5f36b45a9b1ef01283cafd50f9R93

@tribbloid
Copy link
Author

it is only an experiment, will need some serious clean up (particularly the scalafmt part) and API revision before it becomes a feature.

Ideally, I would like to see schema-changing transformation like withColumnReplaced and withColumn yielding a:

TypedDataFrame = TypedDataSet[TypedRow[H]], which preserve both generics & labels of the source case class.

Instead of the old:

TypedDataSet[TupleX], which degrade all columns name into _1, _2, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants