Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is '… #729

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Rajathbharadwaj
Copy link

@Rajathbharadwaj Rajathbharadwaj commented Jul 2, 2023

Fixes TypeError: torch.Size() takes an iterable of 'int' (item 1 is 'NoneType') Error.

When using Transformer4Rec, whilst creating the tabular_inputs from tr.TabularSequenceFeatures.from_schema, it throws a TypeError. After a bit of inspection, the following changes solved the issue.

Fixes #728

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

…NoneType') Error

When using Transformer4Rec, whilst creating the `tabular_inputs` from `tr.TabularSequenceFeatures.from_schema`, it throws an TypeError. After a bit of inspect, the following changes solved the issue.
@rapids-bot
Copy link

rapids-bot bot commented Jul 2, 2023

Pull requests from external contributors require approval from a NVIDIA-Merlin organization member with write permissions or greater before CI can begin.

@rnyak
Copy link
Contributor

rnyak commented Jul 3, 2023

@Rajathbharadwaj hello. thanks for the PR. Can you please first provide a reproducible example with a toy dataset of your error?

@Rajathbharadwaj
Copy link
Author

Hey @rnyak, definitely.

Following the Advanced NVTabular Workflow

import os
from merlin.datasets.entertainment import get_movielens

input_path = os.environ.get("INPUT_DATA_DIR", os.path.expanduser("~/merlin-framework/movielens/"))
get_movielens(variant="ml-1m", path=input_path); #noqa


from merlin.core.dispatch import get_lib

data = get_lib().read_parquet(f'{input_path}ml-1m/train.parquet').sample(frac=1)

train = data.iloc[:600_000]
valid = data.iloc[600_000:]

movies = get_lib().read_parquet(f'{input_path}ml-1m/movies_converted.parquet')



import nvtabular as nvt
from merlin.schema.tags import Tags

train_ds = nvt.Dataset(train, npartitions=2)
valid_ds = nvt.Dataset(valid)

train_ds, valid_ds
train_ds.shuffle_by_keys('userId')
valid_ds.shuffle_by_keys('userId')

genres = ['movieId'] >> nvt.ops.JoinExternal(movies, on='movieId', columns_ext=['movieId', 'genres'])

genres = genres >> nvt.ops.Categorify(freq_threshold=10)

def rating_to_binary(col):
    return col > 3

binary_rating = ['rating'] >> nvt.ops.LambdaOp(rating_to_binary) >> nvt.ops.Rename(name='binary_rating')

userId = ['userId'] >> nvt.ops.Categorify() >> nvt.ops.AddTags(tags=[Tags.USER_ID, Tags.CATEGORICAL, Tags.USER])
movieId = ['movieId'] >> nvt.ops.Categorify() >> nvt.ops.AddTags(tags=[Tags.ITEM_ID, Tags.CATEGORICAL, Tags.ITEM])
binary_rating = binary_rating >> nvt.ops.AddTags(tags=[Tags.TARGET, Tags.BINARY_CLASSIFICATION])


workflow = nvt.Workflow(userId + movieId + genres + binary_rating)

train_transformed = workflow.fit_transform(train_ds)
valid_transformed = workflow.transform(valid_ds)
valid_transformed.compute().head()
train_transformed.schema

# Issue after running this code

from transformers4rec.torch import TabularSequenceFeatures
tabular_inputs = TabularSequenceFeatures.from_schema(
        train_transformed.schema,
        embedding_dim_default=128,
        max_sequence_length=20,
        d_output=100,
        aggregation="concat",
        masking="clm"
    )

It throws the following error
TypeError: torch.Size() takes an iterable of 'int' (item 1 is 'NoneType') Error.

After a bit of inspection, I found that the parameter max_sequence_length isn't passed to the tabular.py file which makes the value of max_sequence_length to None and hence the torch.Size() returns an error of NoneType at the 1st index and max_sequence_length is getting passed to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QST] error related to schema
2 participants