Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: bin by strata #11

Open
ronkeizer opened this issue Oct 30, 2014 · 4 comments
Open

feature: bin by strata #11

ronkeizer opened this issue Oct 30, 2014 · 4 comments

Comments

@ronkeizer
Copy link
Owner

have option "bin_by_strata" to allow optimized bins per strata.
for both automatic and manual binning approaches

@billdenney
Copy link
Contributor

Yes, please!

Looking at the code, it looks like something like making the output of add_stratification into a group_df (the output of group_by) and then apply auto_bin or the manual binning to each grouping level. I'm sure that I'm over-simplifying, but hopefully this can move to implementation.

I'm happy to dive a bit deeper and help with some of the coding for it.

@ronkeizer
Copy link
Owner Author

Sure, happy to get this moving. Let me look again at the code in next few days to see what approach I would take. We can then compare notes and decide what would be best way forward, and who will take lead.

@billdenney
Copy link
Contributor

I started looking at this again today after a long hiatus, and I think that the simplest way to implement auto-binning by stratum to me would be to:

  • first nest the data (dplyr::group_by_at() on the strata, then nest() on that),
  • then operate on each nested dataset as though it were not stratified (since the stratification is outside of that),
  • when operating on the nested dataset, assign the stratum to it as a numeric value (or alternatively as the upper and lower bound where the bin applies),
  • generate summary stats on the nested dataset strata, and finally
  • expand back to an un-nested dataset for plotting.

The first step for that, to me, is to make auto_bin() specific to the underlying data class that is being stratified rather than having it operate on a data.frame. I did that in #54.

@ronkeizer
Copy link
Owner Author

Thanks Bill. Will have a look in next few days, did receive a few more requests for this feature :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants