Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing data #70

Open
limhasic opened this issue Apr 9, 2024 · 2 comments
Open

missing data #70

limhasic opened this issue Apr 9, 2024 · 2 comments

Comments

@limhasic
Copy link

limhasic commented Apr 9, 2024

in paper
[
Missing values

No transformation is done for missing values present in the data.

We let the model learn the distribution of the missing values.

This strategy gives us the flexibility to let the model impute or generate missing values during the sampling process
]

but error occur by missing data

how do i have to?

@avsolatorio
Copy link
Member

Hello @limhasic , could you please share more detail about the error you are getting?

The model should be able to handle NaN values in the raw dataset and you will also have the option to impute or generate NaN values in the synthetic data as well.

To impute, you just need to pass the token id of the NUMERIC_NA_TOKEN to the sample() method.

from realtabformer.data_utils import NUMERIC_NA_TOKEN

model = <REaLTabFormer Model>
model.fit(...)

data = model.sample(..., suppress_tokens=[model.vocab["decoder"]["token2id"][NUMERIC_NA_TOKEN]])

@limhasic
Copy link
Author

I also know that this model needs to learn Nan values as well.

Your also option contradicts the model.

And from the "model.fit" stage, an error occurs due to the presence of Nan values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants