Handle numeric NA values #187

hneth · 2023-03-22T17:41:19Z

This PR improves the handling of NA values in numeric predictors (by adding the option of replacing NA by means) and revises corresponding tests.

…eplacing numeric NA values)

… using the utility function replace_NA_num()

replace_num_NA <- TRUE

…esponding utility functions)

…alues in numeric predictors. Fixed by adding as.vector() when determining cue class.

pa-nathaniel · 2023-03-22T23:11:53Z

Hi Hans! Just lurking here without fully reading the code and only the description. But is this PR really enabling missing value imputation logic for within FFTrees? If so is it enabled by default or not? For both training and test data or just one?

Sorry if I missed the description in NEWS or readme, but I didn't see it, thanks!

hneth · 2023-03-23T10:08:41Z

Hi Nathaniel,

good points, of course. Some clarifications to answer your questions:

But is this PR really enabling missing value imputation logic for within FFTrees?

Yes, it’s replacing NA values in numeric predictors by their mean (per predictor), as commonly done in simulation studies. (Other replacement policies could easily be implemented in the same way.)

If so is it enabled by default or not?

Yes, but there are several global constants that allow enabling / disabling functionality for handling NA values in data. The relevant ones here are:

- allow_NA_pred (to dis/allow NA values in predictors)

replace_NA_num_pred (to dis/en-able replacing NA values in numeric predictors)

Both are currently set to TRUE by default, but generate a bunch of warnings when actually encountering and replacing NA values.

If allow_NA_pred = TRUE, but replace_NA_num_pred = FALSE, NA values in numeric predictors are ignored in FFT-generation (but issue corresponding messages) and classified according to the current choice from fin_NA_options (when reaching a final node). For the competing models, cases with NA values are removed in this case (to avoid crashing).

For both training and test data or just one?

Both training and test data are handled in the same way — and detecting and replacing NA values issues corresponding warnings for both.

Overall, and especially in combination with the default handling of NA values in categorical predictors (as distinct categories), these changes should provide pretty comprehensive options for a variety of cases.

I’d be happy to discuss which of the options should become defaults and which should become user-controlled parameters (of the main FFTrees() function). Also, the new functionality still needs to be tested and documented properly before being ready for release. (If we wanted to be conservative, setting the global NA-related options to FALSE removes all of this immediately and restores the former functionality.)

Looking forward to hear your thoughts,
Hans

hneth added 11 commits March 21, 2023 10:51

add replace_NA_vec(v) and replace_NA_num(df) utility functions (for r…

9377144

…eplacing numeric NA values)

update

d910106

enable replacing NA values in numeric predictors by their mean value,…

b91f36b

… using the utility function replace_NA_num()

minor

b119634

set default of replace_num_NA to

a53f2b9

replace_num_NA <- TRUE

update and increment version

5f3d786

add test for NA in categorical predictors and rename file

04e182c

add functions for replacing NA values in numeric predictors (and corr…

536c104

…esponding utility functions)

bug fix: cue_i_class becomes c("matrix", "array") when replacing NA v…

17f833b

…alues in numeric predictors. Fixed by adding as.vector() when determining cue class.

update tests

bb5d8c6

update and increment version

2c6246e

hneth merged commit 8d69cb1 into ndphillips:master Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle numeric NA values #187

Handle numeric NA values #187

hneth commented Mar 22, 2023

pa-nathaniel commented Mar 22, 2023

hneth commented Mar 23, 2023

Handle numeric NA values #187

Handle numeric NA values #187

Conversation

hneth commented Mar 22, 2023

pa-nathaniel commented Mar 22, 2023

hneth commented Mar 23, 2023