Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use glom to pick out zero, one, or more, array elements with a key that must match a value, and how to load this spec from file #225

Open
davesargrad opened this issue Aug 25, 2021 · 4 comments

Comments

@davesargrad
Copy link

davesargrad commented Aug 25, 2021

Part 1

My target looks like this:

{
   "stuff":"is_cool",
   "yay":1,
   "a_list":[
      {
         "SOURCE TYPE":"Something",
         "ID":"https://blizzard.com"
      },
      {
         "SOURCE TYPE":"Something Else",
         "ARBITRARY":"Milk",
         "notes":[
            {
               "label":"chocolate",
               "comments":[
                  "yummy",
                  "yummiest"
               ]
            },
            {
               "label":"strawberry",
               "comments":[
                  "pink stuff",
                  "yay"
               ]
            }
         ]
      },
      {
         "SOURCE TYPE":"Something Else Else",
         "ALGORITHM":"Cool"
      }
   ]
}

I want to form a spec that will output just the element(s) of "a_list" that contain
"SOURCE TYPE": "Something Else"

So the output would be this:

[
   {
      "SOURCE TYPE":"Something Else",
      "ARBITRARY":"Milk",
      "notes":[
         {
            "label":"chocolate",
            "comments":[
               "yummy",
               "yummiest"
            ]
         },
         {
            "label":"strawberry",
            "comments":[
               "pink stuff",
               "yay"
            ]
         }
      ]
   }
]

Keep in mind this is also a valid target (Two matching "a_list" array elements, so the output would contain a list of length 2),

{
   "stuff":"is_cool",
   "yay":1,
   "a_list":[
      {
         "SOURCE TYPE":"Something",
         "ID":"https://blizzard.com"
      },
      {
         "SOURCE TYPE":"Something Else",
         "ARBITRARY":"Milk",
         "notes":[
            {
               "label":"chocolate",
               "comments":[
                  "yummy",
                  "yummiest"
               ]
            },
            {
               "label":"strawberry",
               "comments":[
                  "pink stuff",
                  "yay"
               ]
            }
         ]
      },
      {
         "SOURCE TYPE":"Something Else",
         "ARBITRARY":"Soda",
         "notes":[
            {
               "label":"berry",
               "comments":[
                  "gross",
                  "yummierest"
               ]
            },
            {
               "label":"cherry",
               "comments":[
                  "soda is good stuff",
                  "cherry is like berry, but with a c"
               ]
            }
         ]
      },
      {
         "SOURCE TYPE":"Something Else Else",
         "ALGORITHM":"Cool"
      }
   ]
}

Is there a spec that can do this?

Part 2

Here, I want to preserve a portion of the higher level object as well

The target is the same, but I want the following output:

{
   "stuff":"is_cool",
   "yay":1,
   "a_list": [
   {
      "SOURCE TYPE":"Something Else",
      "ARBITRARY":"Milk",
      "notes":[
         {
            "label":"chocolate",
            "comments":[
               "yummy",
               "yummiest"
            ]
         },
         {
            "label":"strawberry",
            "comments":[
               "pink stuff",
               "yay"
            ]
         }
      ]
   }
]}

How would I achieve that?

Part 3

Here, I want to preserve a portion of the higher level object as well. I also want to leave out some fields, both in the higher level object, and in the matching array elements.

The target is the same, but I want the following output:

{
   "stuff":"is_cool",
   "a_list": [
   {
      "ARBITRARY":"Milk",
      "notes":[
         {
            "label":"chocolate",
            "comments":[
               "yummy",
               "yummiest"
            ]
         },
         {
            "label":"strawberry",
            "comments":[
               "pink stuff",
               "yay"
            ]
         }
      ]
   }
]}

How would I achieve that?

Part 4

I dont want to configure the spec in a hard-coded fashion. Rather I want to load it as a configuration.
I could wrap the spec in a string and then do an eval. However I think this is bad practice.

As described in the documentation, I don't want to use code injection, nor do I want to use the command line interface for this. I'd rather load the spec from a configuration file, and still have access to the full power of the spec (so that I can use things like lambda functions, and methods such as Coalesce)

image

How do I load the glom spec from a configuration file, or from a string?

@davesargrad davesargrad changed the title Hi. How can I use glom to pick out an array element that meets a convention How can I use glom to pick out zero, one, or more, array elements with a key that must match a value Aug 25, 2021
@davesargrad
Copy link
Author

davesargrad commented Aug 25, 2021

Ah Wow.. As seen in the snippets, this seems to do the trick for the first part of the question:
glom([1, 2, 3, 4, 5, 6], [lambda i: i if i % 2 else SKIP])

I simply replace the target, with my target, and the integer modulo check (inside the spec) with
if i['SOURCE TYPE'] == "Something Else"

spec = ('a_list', [lambda i: i if i['SOURCE TYPE'] == "Something Else" else SKIP])

I'm starting to love glom, and I only just met glom today!

I think I am good for Part 1. Could you please help with a spec for Part 2, Part 3, and Part 4.

@davesargrad
Copy link
Author

Looks like Parts 2 and 3 are also easy.

Something like this does the job.

{'time': ('time'), 'a_param': ('size'), 'another_param': ('shape'), 'sublist': ('a_list', [lambda i: i if i['SOURCE TYPE'] == 'Something Else' else SKIP])}

So at this point I just need an answer for Part 4.

@davesargrad davesargrad changed the title How can I use glom to pick out zero, one, or more, array elements with a key that must match a value How can I use glom to pick out zero, one, or more, array elements with a key that must match a value, and how to load this spec from file Aug 25, 2021
@kurtbrose
Copy link
Collaborator

That's a great question :-) You're knocking on the door of very universal computer science issues.

Can we load an arbitrary spec WITHOUT eval? A spec can embed arbitrary python objects and functions so cannot be represented without "full power" python.

My practical recommendation would be to have config.py or transformers.py or similar file where you store the data. Then, it's up to you be convention to keep the code "simple". This is how e.g. gunicorn and django handle configuration, and I've found it to work well.

Could you have a LIMITED spec and load parts of it from JSON or similar? Yes, absolutely. I think it will end up being less readable than using a python-syntax config file, but it could be done.

@vineetsingh065
Copy link

@davesargrad I also encountered this issue, did you find any solution, My problem is similar to Part 2 scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants