Add TypedEncoder for shapeless Record. #777

tribbloid · 2023-11-21T20:28:14Z

Since RecordEncoder is already converting any product type into shapeless Record:

class RecordEncoder[F, G <: HList, H <: HList](
    implicit
    i0: LabelledGeneric.Aux[F, G],
    i1: DropUnitValues.Aux[G, H],
    i2: IsHCons[H],
    fields: Lazy[RecordEncoderFields[H]],
    newInstanceExprs: Lazy[NewInstanceExprs[G]],
    classTag: ClassTag[F])
    extends TypedEncoder[F] {
...

The only thing required is to break it into 2 stages, such that the intermediate HList/Record representation could serve as a more flexible type-level schema, it could even approximate the capability of the abandoned TypedDataFrame

I also realised that i0~i2 are not used in the function body. i2 is important to not accept HNil, but are i0 & i1 necessary?

The text was updated successfully, but these errors were encountered:

pomadchin · 2023-11-22T02:41:49Z

Hey there; indeed in RecordEncoder those are not required. However these implicits are necessary for the TypedEncoder.usingDerivation function.

I don't think that these are bad here, at least work as a sanity check for us. But any improvement PRs are very much welcome.

Didn't quite follow a part about the shapeless.Record and two stages; usually the idea is to hide shapeless inside and not let it leak into the user API. But shoot a PR I'd be happy to help you to get it merged 👍

tribbloid · 2023-11-26T02:19:49Z

@pomadchin voila, adding an experimental PR that adopts 2-stage RecordEncoder derivation.

The 1st stage is now also used for TypedRow[T <: HList], which can be seen as a successor of the abandoned TypedDataFrame. See the new test for an example usage:

0baf604#diff-dd83f3b1d1a249804b5620473177ce6034efbc5f36b45a9b1ef01283cafd50f9R93

tribbloid · 2023-11-26T02:26:57Z

it is only an experiment, will need some serious clean up (particularly the scalafmt part) and API revision before it becomes a feature.

Ideally, I would like to see schema-changing transformation like withColumnReplaced and withColumn yielding a:

TypedDataFrame = TypedDataSet[TypedRow[H]], which preserve both generics & labels of the source case class.

Instead of the old:

TypedDataSet[TupleX], which degrade all columns name into _1, _2, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TypedEncoder for shapeless Record. #777

Add TypedEncoder for shapeless Record. #777

tribbloid commented Nov 21, 2023 •

edited

pomadchin commented Nov 22, 2023 •

edited

tribbloid commented Nov 26, 2023

tribbloid commented Nov 26, 2023

Add TypedEncoder for shapeless Record. #777

Add TypedEncoder for shapeless Record. #777

Comments

tribbloid commented Nov 21, 2023 • edited

pomadchin commented Nov 22, 2023 • edited

tribbloid commented Nov 26, 2023

tribbloid commented Nov 26, 2023

tribbloid commented Nov 21, 2023 •

edited

pomadchin commented Nov 22, 2023 •

edited