Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add export in json lines format in V5 #671

Open
gstraymond opened this issue Sep 4, 2020 · 2 comments
Open

Add export in json lines format in V5 #671

gstraymond opened this issue Sep 4, 2020 · 2 comments
Labels
More Information Needed Further information is requested MTGJSON v5 Issue affects MTGJSON 5

Comments

@gstraymond
Copy link

Thanks for this great tool, this is really valuable !

I'm currently migrating to the V5 API and was working about integrating prices.
This json is big (around 400Mb on disk) and it's a cumbersome to process it because it takes a lot of RAM just to load the json in memory.

Have you considered to do an export in the json lines format (jsonl): http://jsonlines.org/ ?
The format is one json object per line, this way you can process the files by iterating lines by lines without having to load everything in memory, like a stream.
This could also applied to other exports.

@ZeldaZach
Copy link
Member

Hey @gstraymond thanks for your kind words!

I looked into jsonlines, but it doesn't seem to be a standardized tool. Is this really the best approach to helping folks who can't load in the full data set at once? Is there a feature of your language that will allow for lazy loading you might be able to take advantage of?

It also looks like JSONLines wants constant structs, but that would mean the loss of several data points, like meta data. If you have suggestions, I'm all ears!

@ZeldaZach ZeldaZach added MTGJSON v5 Issue affects MTGJSON 5 More Information Needed Further information is requested labels Sep 10, 2020
@gstraymond
Copy link
Author

gstraymond commented Sep 13, 2020

I agree that's more a convention than a real standard, but it doesn't require to reinvent the wheel since it's 99% based on json.
For the metadata, it's not forbidden to put as the first line, or you can imagine to provide it as a separate file.
On my side, I had to preprocess the file using the streaming API of jq (https://stedolan.github.io/jq/manual/#Streaming), it required more work but it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
More Information Needed Further information is requested MTGJSON v5 Issue affects MTGJSON 5
Projects
None yet
Development

No branches or pull requests

2 participants