Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Chapter 2 OneHotEncoder Shape Mismatch Issue + Solution #115

Open
MadinaKamolova opened this issue Dec 13, 2023 · 0 comments
Open

Comments

@MadinaKamolova
Copy link

Thanks for helping us improve this project!

Before you create this issue
Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml3/blob/main/INSTALL.md#update-this-project-and-its-libraries

Also please make sure to read the FAQ (https://github.com/ageron/handson-ml3#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml3/issues

Describe the bug
Edition 3, page 133/1457 (Kindle e-book), the date fit-transformed by OneHotEncoder is not sent into .toarray() and results in error -- onehotencoder ValueError: Shape of passed values is (2, 1), indices imply (2, 5). With current code in the book, Python sees df_test_unknown.shape as (2,1).

To Reproduce
Please copy the code that fails here, using code blocks like this:

cat_encoder.handle_unknown = "ignore"
cat_encoder.transform(df_test_unknown)

df_output = pd.DataFrame(cat_encoder.transform(df_test_unknown),
                         columns=cat_encoder.get_feature_names_out(),
                         index=df_test_unknown.index)

Solution

cat_encoder.handle_unknown = "ignore"
test = cat_encoder.transform(df_test_unknown)
df_output = pd.DataFrame(test.toarray(),
                         columns=cat_encoder.get_feature_names_out(),
                         index=df_test_unknown.index)

Versions (please complete the following information):

  • OS: Windows 11
  • Python: [e.g. 3.11]

Additional context
Maybe add to FaQ or elsewhere where you think readers will notice (buying a book again for just one fix is impractical)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant