You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2 hours ago
I'm struggling with unnesting some arbitrary JSON-like data in a way that makes sense for us. What I want to avoid is having tables created for nested arrays / lists, but I want to expand all nested objects into columns otherwise. Hence using the max_table_nesting doesn't work.
Is there a simple way to do this that I'm missing?
9 replies
dltHub bot
APP [2 hours ago](https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1714461924910179)
Thank you for your question! It helps us improve our docs and product.
@adrian
will respond to you during GMT+1 working hours :raised_hands:
Aksel Stokseth
2 hours ago
For some added background context:
We're working on a stream of objects with unknown incoming schema and new schemas can arrive at any time. We split these objects into separate tables based on some properties og the objects, but where these nested arrays might arrive is unknown.
adrian
[2 hours ago](https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1714461929379589?thread_ts=1714461924.910179&cid=C04DQA7JJN6)
maybe something like this before yielding? https://www.geeksforgeeks.org/python-pandas-flatten-nested-json/
GeeksforGeeksGeeksforGeeks
Python Pandas - Flatten nested JSON - GeeksforGeeks
A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
Mar 16th, 2022
:eyes:
1
Aksel Stokseth
1 hour ago
Thanks for the answer! I'm testing out this approach, but it breaks all of my other assumptions about the data since these transformations happen pre-normalization.
adrian
1 hour ago
that makes sense. unfortunately dlt does not support anything for this case directly so some kind of pre-processing is needed. Is the problem now with just pre processing in general or particularly with the way pandas handles it? for the former i guess you would need to adjust the assumptions, for the latter we could look into some other way to preprocess depending on what the exact issue is?
Aksel Stokseth
[1 hour ago](https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1714465622706489?thread_ts=1714461924.910179&cid=C04DQA7JJN6)
I wrote my own simple unnester that flattens the JSON, so now I'm just fighting the framework to figure out what I'm doing wrong :sweat_smile:
In general though, having a simple setting for always treating lists as complex would also have solved my use case.
:bulb:
1
Aksel Stokseth
[1 hour ago](https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1714465822984669?thread_ts=1714461924.910179&cid=C04DQA7JJN6)
Initially thought I'd be able to use preferred_types , but this only matches on exact column names / regexes. (edited)
Aksel Stokseth
1 hour ago
I've gotten it to work "as expected" now, but of course it's a bit cumbersome to now have essentially two "normalization engines" in play at the same time.
First normalize the JSON with my homegrown normalizer, then dlt goes to work on the objects again afterwards.
Aksel Stokseth
45 minutes ago
I guess I can opt out of the dlt one, but ideally I would like dlt to handle all of the normalization :blush:
Proposed solution
No response
Related issues
No response
The text was updated successfully, but these errors were encountered:
Feature description
A user would like some kind of normaliser mode that keeps every list as complex.
https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1714461924910179
There have been different requests for other modes (infer type from string), perhaps we can do something general here.
Are you a dlt user?
None
Use case
https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1714461924910179
Proposed solution
No response
Related issues
No response
The text was updated successfully, but these errors were encountered: