Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested array #285

Open
turbomaicol opened this issue Apr 12, 2023 · 1 comment
Open

Nested array #285

turbomaicol opened this issue Apr 12, 2023 · 1 comment

Comments

@turbomaicol
Copy link

Trying to convert JSON to Parquet

Sample Json:
{
"Stype":"BaseDecorator",
"Decorators":[
{"Stype":"FiscalInformationDecorator","FiscalInformation":{"Stype":"FiscalInformation","UUID":"02d0c973-727e-449e-bb4e-45dddbd7dbeb", etc...}},
{"Stype":"DocumentInformationDecorator","DocumentInformation":{"Stype":"DocumentInformation","DocumentModelID":"7ec7b1d4-f94f-42b5-ba36-77701cdf1db4", etc...}},
{"Stype":"IssuingInformationDecorator","IssuingInformation":{"Stype":"IssuingInformation","RFC":"PRR890126QC2", etc...}}
],
"InstanceID":"78091f6e-e458-4a23-abfe-fe286b24b59a",
"company":"d6038f2d-787c-427b-8eaf-4d9eea44a24a"
}

Decorators is an array

Using:
var stringJson = JArray.FromObject(deserialized_jsons).ToString();
using (var r = ChoJSONReader.LoadText(stringJson).ErrorMode(ChoErrorMode.IgnoreAndContinue))
{
using (var w = new ChoParquetWriter(stream, new ChoParquetRecordConfiguration { CompressionMethod = Parquet.CompressionMethod.Snappy})
.ThrowAndStopOnMissingField(false)
.ErrorMode(ChoErrorMode.IgnoreAndContinue))
{
w.Write(r);
}
}

Can I have it be represented as:
stype string
decorators array<struct<Stype:string,FiscalInformation:struct<Stype:string,UUID:string,CFDIUse:string, etc...
InstanceID string
company string

instead of
type string
decorators_0_stype string
decorators_0_fiscalinformation_stype string
decorators_0_fiscalinformation_uuid string, etc...

I don't want a column for each property of each nested array, all of them separated by numbers. I want one column that contains all the elements of the nested array.

Is there a way to have the column be an array for search purposes? (e.g. when using Amazon Athena to query the file as a parquet file)

If I generate the parquet file with AWS Glue, it gives me the column as array

@Cinchoo
Copy link
Owner

Cinchoo commented Apr 15, 2023

I'm afraid can do that. Can u spell out expected parquet file layout with possible values? I'll take a look and provide u input. Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants