Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet TODO #276

Open
4 of 6 tasks
nevillelyh opened this issue Feb 19, 2021 · 2 comments
Open
4 of 6 tasks

Parquet TODO #276

nevillelyh opened this issue Feb 19, 2021 · 2 comments

Comments

@nevillelyh
Copy link
Contributor

nevillelyh commented Feb 19, 2021

  • Avro array support in AvroWriteSupport - old TwoLevelListWriter vs new ThreeLevelListWriter
  • Avro nullabe arrays and arrays of nullables
  • Fix parquet.avro.data.supplier with generic records in test improve Parquet-Avro compatiblity #278
  • Schema compatibility check in ReadSupport 2aea4e8
  • Schema evolution for enums Add UnsafeEnum #290
  • Schema evolution for arrays 6c00ecb
@nevillelyh nevillelyh changed the title Parquet TODO: Parquet TODO Feb 20, 2021
@nevillelyh
Copy link
Contributor Author

Turns out the new 3 level list is more complex.

With the default 2 level list, myField: List[T] is written as:

required group myField (LIST) {
  repeated T array;
}

But the Avro counter part is still "name": "myField", "type": "array", "items": T

While with 3 level list, the Parquet schema becomes:

required group myField (LIST) {
  repeated group list {
    required T element;
  }
}

And the Avro record becomes [{"element": t1}, {"element": t1}]...

WIP in https://github.com/spotify/magnolify/tree/neville/pq-avro

@nevillelyh
Copy link
Contributor Author

More on Avro array mapping. The following Avro fields

{"name": "field1", "type:" {"type": "array", "items": "string"}, "default": [] } // required array field that defaults to empty array
{"name": "field2", "type:" ["null", {"type": "array", "items": "string"}], "default": null } // nullable array field that defaults to null

map to:

required group field1 (LIST) {
  repeated binary array (STRING);
}
optional group field2 (LIST) {
  repeated binary array (STRING);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant