Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema derivation yields "Unknown datum class" w/ nested classes / scalapb enums as defaults #826

Open
chollinger93 opened this issue Mar 15, 2024 · 0 comments

Comments

@chollinger93
Copy link

Schema derivation yields "Unknown datum class" w/ nested classes / scalapb enums as defaults

Error

I'm getting a Unknown datum class: ExampleEnumEvent$Action$Undefined$ error when trying to derive a schema for a scalapb generated enum.

More generally, this can be reproduced with any nested structure (see below).

Similar to #677?

Minimal Protobuf Example

// pb/types.proto
package pb;

message ID {
  string id = 1;
}

// example.proto
syntax = "proto3";

import "pb/types.proto";

message ExampleEnumEvent {
  pb.ID id = 1;
  Action action = 2;
  enum Action {
    Undefined = 0;
    Allow = 1;
    Deny = 2;
  }
}

Package settings preserve_unknown_fields: false, lenses: false.

Yields

@SerialVersionUID(0L)
final case class ExampleEnumEvent(
    id: _root_.scala.Option[pb.types.ID] = _root_.scala.None,
    action: ExampleEnumEvent.Action = ExampleEnumEvent.Action.Undefined
    ) extends scalapb.GeneratedMessage {

Action is a

sealed abstract class Action(val value: _root_.scala.Int) extends _root_.scalapb.GeneratedEnum 

Test

import com.sksamuel.avro4s.{SchemaFor, ToRecord, Encoder as AvroEncoder}

val e = ExampleEnumEvent(
      id = Some(ID("1")),
    )
type T = ExampleEnumEvent
val schema = AvroSchema[T]
println(schema.toString(true))
val enc = AvroEncoder[T]
val toRecord: ToRecord[T] = ToRecord[T](schema)(using enc)
val gen = toRecord.to(e)
println(gen)

Which gets us

Unknown datum class: class ExampleEnumEvent$Action$Undefined$
org.apache.avro.AvroRuntimeException: Unknown datum class: class ExampleEnumEvent$Action$Undefined$
	at org.apache.avro.util.internal.JacksonUtils.toJson(JacksonUtils.java:96)
	at org.apache.avro.util.internal.JacksonUtils.toJsonNode(JacksonUtils.java:53)
	at org.apache.avro.Schema$Field.<init>(Schema.java:598)
	at com.sksamuel.avro4s.schemas.Records$.buildSchemaField(records.scala:89)
	at com.sksamuel.avro4s.schemas.Records$.$anonfun$1(records.scala:31)
	at scala.collection.immutable.List.flatMap(List.scala:293)
	at com.sksamuel.avro4s.schemas.Records$.schema(records.scala:32)
	at com.sksamuel.avro4s.schemas.MagnoliaDerivedSchemas.join(magnolia.scala:14)
	at com.sksamuel.avro4s.schemas.MagnoliaDerivedSchemas.join$(magnolia.scala:10)
	at com.sksamuel.avro4s.SchemaFor$.join(SchemaFor.scala:55)

W/o proto

This has functionally the same effect:

final case class Nested(s: String = "foo", n: Nested.Nest = Nested.Undefined())
object Nested {
  sealed abstract class Nest(i: Int)
  final case class Undefined() extends Nest(-1)
  final case class N(i: Int) extends Nest(i)
}

Validation / Workaround

If we set no_default_values_in_constructor (or remove the default), it works and yields:

{
  "type" : "record",
  "name" : "ExampleEnumEvent",
  "namespace" : "test",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "string" ]
  }, {
    "name" : "action",
    "type" : [ {
      "type" : "record",
      "name" : "Allow",
      "namespace" : "ExampleEnumEvent.Action",
      "fields" : [ ]
    }, {
      "type" : "record",
      "name" : "Deny",
      "namespace" : "ExampleEnumEvent.Action",
      "fields" : [ ]
    }, {
      "type" : "enum",
      "name" : "Recognized",
      "namespace" : "ExampleEnumEvent.Action",
      "symbols" : [ "Undefined", "Allow", "Deny" ]
    }, {
      "type" : "record",
      "name" : "Undefined",
      "namespace" : "ExampleEnumEvent.Action",
      "fields" : [ ]
    }, {
      "type" : "record",
      "name" : "Unrecognized",
      "namespace" : "ExampleEnumEvent.Action",
      "fields" : [ {
        "name" : "unrecognizedValue",
        "type" : "int"
      } ]
    } ]
  } ]
}

Alternatively, explicitly setting

val e = ExampleEnumEvent(
      id = Some(ID("1")),
      action = ExampleEnumEvent.Action.Undefined
    )

Has the same effect (which of course isn't viable for events that come in from another service).

I can't do a given SchemaFor[ExampleEnumEvent.Action] = SchemaFor[ExampleEnumEvent.Action], since that causes a StackOverflowError, since I suppose that causes infinite recursion at runtime.

I've also tried tricking avro4s into treating the scalapb.GeneratedEnum as a Enumeration type by defining a trait that extends from Enumeration, but to no avail.

Other

On a side note, compilation time for scalapb generated objects that include a val of AvroSchema[A] are very, very long, presumably since the generated scalapb classes are rather large (up to ~10s/class). Scala 3 doesn't have any good compiler profilers, as far as I'm aware, so I'm not 100% sure where exactly that happens.

But I figured I'd mention that here, since I'm not sure if that's expected.

Environment

22.04.1-Ubuntu, avro4s 5.0.9, Scala 3.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant