Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cannot pickle 'mappingproxy' object when using TabularFeatures create_categorical #727

Open
denadai2 opened this issue Jun 27, 2023 · 3 comments
Labels
bug Something isn't working status/needs-triage

Comments

@denadai2
Copy link

denadai2 commented Jun 27, 2023

Bug description

I have a bug just creating a schema programmatically. Can you help me on this?

thx

Steps/Code to reproduce bug

import merlin_standard_lib as msl
from merlin_standard_lib import Schema
from transformers4rec.torch.features.tabular import TabularFeatures


features_schema = Schema([msl.ColumnSchema.create_categorical("language", num_items=149),]
        )
a = TabularFeatures.from_schema(
        features_schema,
    )

I have TypeError: cannot pickle 'mappingproxy' object

coming from

│ /home/mdenadai/miniconda3/envs/gnn/lib/python3.9/site-packages/transformers4rec/torch/features/t │
│ abular.py:175 in from_schema                                                                     │
│                                                                                                  │
│   172 │   │   │   │   │   **kwargs,                                                              │
│   173 │   │   │   │   )                                                                          │
│   174 │   │   │   else:                                                                          │
│ ❱ 175 │   │   │   │   maybe_continuous_module = cls.CONTINUOUS_MODULE_CLASS.from_schema(         │
│   176 │   │   │   │   │   schema, tags=continuous_tags, **kwargs                                 │
│   177 │   │   │   │   )                                                                          │
│   178 │   │   if categorical_tags:                                                               │
│                                                                                                  │
│ /home/mdenadai/miniconda3/envs/gnn/lib/python3.9/site-packages/transformers4rec/torch/tabular/ba │
│ se.py:190 in from_schema                                                                         │
│                                                                                                  │
│   187 │   │   -------                                                                            │
│   188 │   │   Optional[TabularModule]                                                            │
│   189 │   │   """                                                                                │
│ ❱ 190 │   │   schema_copy = deepcopy(schema)                                                     │
│   191 │   │   if tags:                                                                           │
│   192 │   │   │   schema_copy = schema_copy.select_by_tag(tags)

This happens even when I just do:

import deepcopy
import merlin_standard_lib as msl
from merlin_standard_lib import Schema
from transformers4rec.torch.features.tabular import TabularFeatures

deepcopy(Schema([msl.ColumnSchema.create_categorical("language", num_items=149),]))

Environment details

  • Transformers4Rec version: 23.6.0
  • Platform: unix
  • Python version: 3.9
@denadai2 denadai2 added bug Something isn't working status/needs-triage labels Jun 27, 2023
@denadai2
Copy link
Author

denadai2 commented Jun 27, 2023

It seems that if I removeint_domain from ColumnSchema everthing can be copied

class ColumnSchema(Feature):
    @classmethod
    def create_categorical(
        cls,
        name: str,
        num_items: int,
        shape: Optional[Union[Tuple[int, ...], List[int]]] = None,
        value_count: Optional[Union[ValueCount, ValueCountList]] = None,
        min_index: int = 0,
        tags: Optional[TagsType] = None,
        **kwargs,
    ) -> "ColumnSchema":
        _tags: List[str] = [t.value for t in TagSet(tags or [])]

        extra = _parse_shape_and_value_count(shape, value_count)
        int_domain = IntDomain(name=name, min=min_index, max=num_items, is_categorical=True)
        _tags = list(set(_tags + [Tags.CATEGORICAL.value]))
        extra["type"] = FeatureType.INT

        return cls(name=name, int_domain=int_domain, **extra, **kwargs).with_tags(_tags)

@denadai2
Copy link
Author

and it gets solved with betterproto 2.0, maybe because of danielgtaylor/python-betterproto#339. However, this creates a dependency clash

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. merlin-core 23.6.0 requires betterproto<2.0.0, but you have betterproto 2.0.0b6 which is incompatible.

@EvenOldridge
Copy link
Member

Thanks for the detailed bug report and the fix.

You can try updating the dependencies in requirements.txt; there's a reasonable chance that it'll work. We're unfortunately not able to update our containers at this time but if you can test that it's working we'd love a PR with your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/needs-triage
Projects
None yet
Development

No branches or pull requests

2 participants