missing data #70

limhasic · 2024-04-09T07:58:46Z

in paper
[
Missing values

No transformation is done for missing values present in the data.

We let the model learn the distribution of the missing values.

This strategy gives us the ﬂexibility to let the model impute or generate missing values during the sampling process
]

but error occur by missing data

how do i have to?

avsolatorio · 2024-04-24T18:03:02Z

Hello @limhasic , could you please share more detail about the error you are getting?

The model should be able to handle NaN values in the raw dataset and you will also have the option to impute or generate NaN values in the synthetic data as well.

To impute, you just need to pass the token id of the NUMERIC_NA_TOKEN to the sample() method.

from realtabformer.data_utils import NUMERIC_NA_TOKEN

model = <REaLTabFormer Model>
model.fit(...)

data = model.sample(..., suppress_tokens=[model.vocab["decoder"]["token2id"][NUMERIC_NA_TOKEN]])

limhasic · 2024-04-25T00:21:29Z

I also know that this model needs to learn Nan values as well.

Your also option contradicts the model.

And from the "model.fit" stage, an error occurs due to the presence of Nan values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing data #70

missing data #70

limhasic commented Apr 9, 2024

avsolatorio commented Apr 24, 2024

limhasic commented Apr 25, 2024

missing data #70

missing data #70

Comments

limhasic commented Apr 9, 2024

avsolatorio commented Apr 24, 2024

limhasic commented Apr 25, 2024