Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] IndexError: list index out of range #1872

Open
Oussamakhammassi opened this issue Jan 29, 2024 · 3 comments
Open

[BUG] IndexError: list index out of range #1872

Oussamakhammassi opened this issue Jan 29, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Oussamakhammassi
Copy link

I'm working on a transformers4rec project and i want to do the proprocessing and the encoding of categorical features in my sql framework before entering the table and without using categorify() function to the model. But i get this error when i debugg .from_schema() function:

issue

@Oussamakhammassi Oussamakhammassi added the bug Something isn't working label Jan 29, 2024
@rnyak
Copy link
Contributor

rnyak commented Jan 29, 2024

@Oussamakhammassi without using categorify() function

TF4Rec Models are designed to read from the schema file. Categorify op is critical to give information about the number of unique categories for a given column (categorical feature), and it adds the tag Categorical automatically in the schema.

if you dont use categorify op, you need to create a proper schema file yourself. A schema should have proper tags for all categorical and continuous features. Tags like, categorical, continuous, is_list, is_ragged, etc etc..

you can check out one of the NVTabular nbs here and see how a schema file looks like.

@Oussamakhammassi
Copy link
Author

Can you give me a tutorial link on how to create my proper schema file?

@rnyak
Copy link
Contributor

rnyak commented Feb 1, 2024

@Oussamakhammassi sorry, we dont have a tutorial for how to create a schema file if you dont use NVTabular. if you use NVTabular it is created automatically. what you can do is to run one of the toy examples, and then take the schema file, and change the numbers and feature names and types based on your dataset.

Run this example until cell 10, you will see there is a schema file saved on disk. you can try to modify it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants