Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour between attributes/keys after release 22.1.0 #233

Open
suvayu opened this issue Jan 27, 2022 · 5 comments
Open

Inconsistent behaviour between attributes/keys after release 22.1.0 #233

suvayu opened this issue Jan 27, 2022 · 5 comments

Comments

@suvayu
Copy link

suvayu commented Jan 27, 2022

I use glom with a library called frictionless. It's a library that manages metadata for datasets based on the frictionless specification.

When using release 22.1.0, I see inconsistent behaviour between attributes/keys. Most (all?) of the metadata classes in frictionless are subclasses of dict. I have a hunch, the inconsistent behaviour is because of that.

Here's a minimal example using the Schema class from frictionless. I have put comments next to indicate expected behaviour, or to comment on the error.

from frictionless import Schema
from glom import glom, Assign

fields = [
    {"name": "id", "type": "number"},
    {"name": "foo", "type": "datetime"},
    {"name": "bar", "type": "string"},
    {"name": "baz", "type": "boolean"},
]
schema = Schema(fields=fields, missing_values="NA")

# access
assert schema.primary_key == []
assert glom(schema, "primary_key") == []  # fails with KeyError

# expected
try:
    assert schema["primaryKey"]
except KeyError:
    pass

# expected
try:
    assert glom(schema, "primaryKey")
except KeyError:
    pass

# assign
glom(schema, Assign("primary_key", "foo"))
assert schema.primary_key == ["foo"]
assert schema["primaryKey"] == "foo"

glom(schema, Assign("primaryKey", "id"))  # fails with AttributeError
assert schema["primaryKey"] == "id"

Edit: I simplified the examples

@kurtbrose
Copy link
Collaborator

Thanks for finding that, and sorry we broke your library!

This corner case wasn't covered in our tests (Path behavior on an object where both [] and . are valid).

There isn't any obvious reason for the change in behavior in the changes for this release.

I think your hunch is correct -- the behavior of strings is that they will try ["val"], .val, [int("val")] depending on the type. Before, . probably had higher precedence than [].

There's two fixes if you want to keep using the new version without waiting for us:

1- Use T.val to disambiguate in the spec.
2- Use register() to disambiguate globally that frictionless types use . not [] https://glom.readthedocs.io/en/latest/api.html#glom.register

Just for additional information, do you have any of the stack traces handy? (If the underlying error is KeyError "primarykey" that confirms the root cause is [] vs .)

@suvayu
Copy link
Author

suvayu commented Jan 29, 2022

Hi @kurtbrose, sorry about my delayed response. I was trying to get more details. Thanks to your comments, I have simplified my example above. You are correct, when accessing the error is KeyError:

Traceback (most recent call last):
  File "/home/user/testing/glom-frictionless-bug/test_bug.py", line 14, in <module>
    assert glom(schema, "primary_key") == []  # fails with KeyError
  File "/home/user/.virtualenvs/tmp-33f0ead521b8cd1/lib/python3.9/site-packages/glom/core.py", line 2294, in glom
    raise err
glom.core.PathAccessError: error raised while processing, details below.
 Target-spec trace (most recent last):
 - Target: {'fields': [{'name': 'id', 'type': 'number'}, {'name': 'foo', 'type': 'datetime'}, {'n... (len=2)
 - Spec: 'primary_key'
glom.core.PathAccessError: could not access 'primary_key', part 0 of Path('primary_key'), got error: KeyError('primary_key')

However I don't think that is the problem. My code started failing because the precedence between key and attribute has changed for assign (as shown by the following traceback).

Traceback (most recent call last):
  File "/home/user/testing/glom-frictionless-bug/test_bug.py", line 34, in <module>
    glom(schema, Assign("primaryKey", "id"))  # fails with AttributeError
  File "/home/user/.virtualenvs/tmp-33f0ead521b8cd1/lib/python3.9/site-packages/glom/core.py", line 2294, in glom
    raise err
glom.core.PathAssignError: error raised while processing, details below.
 Target-spec trace (most recent last):
 - Target: {'fields': [{'name': 'id', 'type': 'number'}, {'name': 'foo', 'type': 'datetime'}, {'n... (len=3)
 - Spec: Assign(Path('primaryKey'), 'id')
 - Spec: Path()
glom.core.PathAssignError: could not assign 'primaryKey' on object at Path(), got error: AttributeError("'Schema' object has no attribute 'primaryKey'")

In my code, I was consistently using key based access (so ["primaryKey"] instead of .primary_key). As Schema is a subclass of dictionary, this seemed natural to me. The code broke (only my assigns failed), because the precedence changed to favour attributes.

And thank you so much for suggesting the two possible fixes in case it takes a while to resolve this in glom. Much appreciated :)

@kurtbrose
Copy link
Collaborator

kurtbrose commented Jan 30, 2022

Thanks, I missed that it was the assignment target that was failing.

That makes a lot more sense because we DID touch that code this release.

(we added * and ** to paths, and you can assign to a glob/multi-path: Assign('*.primary_key', None) instead of [Assign('primary_key', None)] or (T.values(), [Assign('primary_key', None)])

@kurtbrose
Copy link
Collaborator

Thanks for the simplified reproduction, I was able to grab it pretty directly into a test.

I've tried running the test against different releases, all the way back to 18.4.0 and they all fail in the same place.

        # access
        assert schema.primary_key == []
>       assert glom(schema, "primary_key") == []  # fails with KeyError

glom\test\test_regressions.py:26:
# ... snip out a bunch of glom internal lines ...
              except Exception as e:
>                   raise PathAccessError(e, Path(_t), i // 2)
E                   glom.core.PathAccessError: could not access 'primary_key', part 0 of Path('primary_key'), got error: KeyError('primary_key')

glom\core.py:894: PathAccessError

Maybe the version you were using was older than that?

@suvayu
Copy link
Author

suvayu commented Jan 30, 2022

I think my description of the issue hasn't been very clear, my fault. I'll try to be clearer.

Since a dict subclass is primarily meant for key-based lookup, I have been using string paths with glom only for keys.

from frictionless import Schema
from glom import glom, Assign

fields = [
    {"name": "id", "type": "number"},
    {"name": "foo", "type": "datetime"},
    {"name": "bar", "type": "string"},
    {"name": "baz", "type": "boolean"},
]
schema = Schema(fields=fields, missing_values="NA")

# expected
try:
    assert schema["primaryKey"]
except KeyError:
    pass

# expected
try:
    assert glom(schema, "primaryKey")
except KeyError:
    pass

The above works as expected (older versions, and latest release).

However assignment used to work with 20.11, but doesn't work with 22.1.

schema["primaryKey"] = "foo"
assert schema["primaryKey"] == "foo"

glom(schema, Assign("primaryKey", "id"))  # AttributeError with 22.1
assert schema["primaryKey"] == "id"

While investigating the above, I realised the following also fails (for all versions) and included it in my minimal example.

glom(schema, "primary_key")  # KeyError for both versions

However thanks to your first comment I see it is unsurprising, and maybe even expected, as Schema is a subclass of dict. I don't know the glom internals, but I would guess while it might be possible to make string paths work for attribute access for a dict subclass, it isn't possible to implement assignment. Either way, I wouldn't consider support for this important, as there are alternatives (as suggested in your first response, thanks!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants