Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bijection-avro sometimes deserializes objects to GenericData.Record instead of the requested type #265

Open
rabejens opened this issue Jun 12, 2017 · 1 comment

Comments

@rabejens
Copy link

I defined an Avro schema and used SBT Avrohugger to generate the Scala code. Serialization and deserialization so far works on my local machine. I am doing something like this:

val x: Array[Byte] = ... // Get the serialized data
val myThing = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x)

When I run this locally, it works perfectly. I now created a Spark task that can be submitted to Spark with the help of the SBT Assembly plugin. When I "submit" this task locally (using spark-submit --master local[*]), this serialization works. However, when I submit it to a "real" Spark installation, I get a CCE:

Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.example.avro.MyAvroThing

So, the deserializer does not recognize the format and deserializes it to a generic Avro type. I double checked that all necessary Avro libraries and Twitter's Bijection-Avro are correctly embedded in my resulting JAR.

As a next investigation step, I analyzed the GenericData.Record I get by doing:

val mystery = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x).asInstanceOf[Try[Any]]
mystery.get match {
  case _: MyAvroThing => println("ok!")
  case r: GenericData.Record => println("Got a generic record with schema: " + r.getSchema.getFields.map(_.name()).mkString(", "))
  case _ => println("Got something completely different")
}

When I run this locally, it prints out ok! as it correctly gets the MyAvroThing. When I run this on the Spark cluster, I get:

Got a generic record with schema: foo, bar, quux

this means, my schema IS honored by the deserializer and it is deserialized correctly, only the transformation to the resulting class is not done somehow.

When I query the record's fields by name, I get the correct data I expect in my MyAvroThing.

What is going wrong here?

@johnynek
Copy link
Collaborator

I wonder if the issue could be a classpath issue. Locally you have one version of avro, on the cluster you have another and it shows up as a runtime error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants