Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sklearn adapter function tovw does not support unsigned integers in features #4609

Open
jackgerrits opened this issue Jun 7, 2023 · 3 comments
Labels
Bug Bug in learning semantics, critical by default Lang: Python

Comments

@jackgerrits
Copy link
Member

jackgerrits commented Jun 7, 2023

Mitigation

A user should use signed integer types and not unsigned integer types when passing to the sklearn adapter functions.

Details

The tovw function uses dump_svmlight_file to convert to a format that can easily construct VW text examples.

This function does not support input of unsigned integers, it requires signed due to the pyx code internally in sklearn.

Fails:

from vowpalwabbit.sklearn import VWRegressor
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1]}, dtype='uint32')
y = pd.Series(np.zeros(1))

VWRegressor().fit(X, y)

Succeeds:

from vowpalwabbit.sklearn import VWRegressor
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1]}, dtype='int32') # <-----
y = pd.Series(np.zeros(1))

VWRegressor().fit(X, y)

The same input works when passed to SKLearn itself:

from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1]}, dtype='uint32')
y = pd.Series(np.zeros(1))

LinearRegression().fit(X, y)

To fix this one way is to avoid using the dump_svmlight_file function. It is used currently as a way to easily convert the dataframe to vw text format.

@jackgerrits jackgerrits added Lang: Python Bug Bug in learning semantics, critical by default labels Jun 7, 2023
@mahimairaja
Copy link

Is this issue still open?

@jackgerrits
Copy link
Member Author

Yep! Feel free to tackle it if you'd like

@manthanindane
Copy link

Is this issue open? Can I work on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug in learning semantics, critical by default Lang: Python
Projects
None yet
Development

No branches or pull requests

3 participants