Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampler is not needed in pretrain mode for valid dataloader #499

Open
cyang31 opened this issue Jul 12, 2023 · 0 comments · May be fixed by #498
Open

sampler is not needed in pretrain mode for valid dataloader #499

cyang31 opened this issue Jul 12, 2023 · 0 comments · May be fixed by #498
Assignees
Labels
enhancement New feature or request

Comments

@cyang31
Copy link

cyang31 commented Jul 12, 2023

Describe the bug

What is the current behavior?
The current validation dataloader in pretraining_utils.py takes the sampler generated from the X_train, which causes errors if X_train and X_valid in eval_set don't have the exact size.

If the current behavior is a bug, please provide the steps to reproduce.
Here's a script to reproduce the issue, where weights is assigned as a ndarray.

from pytorch_tabnet.pretraining import TabNetPretrainer
import numpy as np
import torch

# Set the random seed for reproducibility
np.random.seed(42)

# Generate random features
num_train_samples = 100000
num_valid_samples = 50000
num_features = 10

X_train = np.random.rand(num_train_samples, num_features)
X_valid = np.random.rand(num_valid_samples, num_features)

# Generate random binary labels
y_train = np.random.randint(2, size=num_train_samples)
y_valid = np.random.randint(2, size=num_valid_samples)

num_positive_samples = np.sum(y_train)
num_negative_samples = len(y_train)-num_positive_samples
class_weights=np.zeros(len(y_train))

class_weights[y_train==0] = 1/num_negative_samples
class_weights[y_train==1] = 1/num_positive_samples

# TabNetPretrainer
unsupervised_model = TabNetPretrainer(
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=2e-2),
    mask_type='entmax', # "sparsemax",,
    device_name='cpu'
)

unsupervised_model.fit(
    X_train=X_train,
    eval_set=[X_valid],
    pretraining_ratio=0.5,
    weights=class_weights
)

Expected behavior
Now the above script returns an error IndexError: index 94028 is out of bounds for axis 0 with size 50000, which suggests that the weights is also applied to the X_valid which is not necessary.

Screenshots

Other relevant information:
poetry version:
python version: 3.8
Operating System: linux, macos
Additional tools:

Additional context

@cyang31 cyang31 added the bug Something isn't working label Jul 12, 2023
@cyang31 cyang31 linked a pull request Jul 12, 2023 that will close this issue
@Optimox Optimox added enhancement New feature or request and removed bug Something isn't working labels Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants