Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chill out, Background #415

Open
goldingn opened this issue Mar 6, 2018 · 7 comments
Open

chill out, Background #415

goldingn opened this issue Mar 6, 2018 · 7 comments

Comments

@goldingn
Copy link
Member

goldingn commented Mar 6, 2018

Background has an na.omit() step before returning the dataframe, which removes rows if any column has NA values.

Ideally, this should only remove rows with NAs in the the covariate columns (those columns named in the covCols attribute), since otherwise it causes problems if there are user-defined columns (#414).

@timcdlucas
Copy link
Contributor

I'm not convinced we want to automatically na.omit() based on covariates either.

  • Some models can handle NAs fine.
  • There might be cases where NA is the most appropriate value (some grouping variables or something perhaps?)
  • Process modules that impute missing data may be written later and auto removing all NAs will make that difficult.

The two options that seem reasonable to me are

  1. Leave everything by default and have a RemoveNAs process module (which I've written and just waiting to find time to upload).
  2. Have a na.rm argument to workflow.

Personally I like the first.

@AugustT
Copy link
Member

AugustT commented Mar 6, 2018

I agree 1 is the ideal solution. This would also require a review of existing models to ensure they work with NA values and addition of NA values to the module checking routines

@timcdlucas
Copy link
Contributor

But the current models don't need to work with NA values. I guess it would be good if they could error in a useful way if they can't handle NAs. Maybe that's what you mean by "work with NA values". But the workflow I imagine is:

  • `worklflow(..., process = NoProcess, ...)
  • Oh, this model can't handle NAs and I have NAs.
  • `workflow(..., process = RemoveNAs, ...)
  • Ah good it works now. But I'll have a think about whether I should impute or use different covariates or whatever.

@AugustT
Copy link
Member

AugustT commented Mar 6, 2018

Yes, I mean throw a sensible error or remove NAs with warning

@timcdlucas
Copy link
Contributor

timcdlucas commented Mar 6, 2018

OK yes then I totally agree. Should throw a sensible error.

@goldingn
Copy link
Member Author

goldingn commented Mar 6, 2018

Yeah, I agree. Best thing is to make the existing models remove NAs (with a warning) so we don't break any existing workflows,and to remove na.omit here.

@timcdlucas
Copy link
Contributor

Just to go back to this the RemoveNAs module is now in the repo. I don't think Background has had it's na.omit line removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants