Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ckMeans return format is inconvenient #273

Open
stevage opened this issue Dec 19, 2017 · 3 comments
Open

ckMeans return format is inconvenient #273

stevage opened this issue Dec 19, 2017 · 3 comments

Comments

@stevage
Copy link

stevage commented Dec 19, 2017

I've used the ckMeans function a couple of times now, and just wanted to report that the output format is pretty inconvenient for my (and, I'm guessing, others') general use case.

What I want:

  1. pass a bunch of values (eg ckmeans([-1, 2, -1, 2, 4, 5, 6, -1, 2, -1], 3))
  2. get an array of breaks, one fewer than groups requested (eg [0, 3])
  3. hence, color code any individual value by comparing it to each of the breaks in turn

ckMeans() instead returns an array of the groups themselves ([[-1, -1, -1, -1], [2, 2, 2], [4, 5, 6]]) which requires a bit of fiddly manipulation to get into a more useful format. I guess the current format would be useful when your visualisation method is "draw all the values in the first group, then the next group, etc", maybe?

Another possible more convenient output format would be an object, mapping value to group:

{
  -1: 0,
  2: 1,
  4: 2,
  5: 2,
  6: 2
}
@andrewharvey
Copy link

Came here to say the same thing actually. I'd prefer the same output format as https://simplestatistics.org/docs/#equalintervalbreaks, an array of break points.

@tmcw
Copy link
Member

tmcw commented Jun 1, 2018

Happy to consider an alternative format as a separate function, like ckmeansBreaks. Sounds like something like output.map((values, i) => values[i ? values.length - 1 : 0]) is all that's necessary change the existing format to what's desired?

@stevage
Copy link
Author

stevage commented Jun 7, 2018

Since raising this originally I now realise that there are several possible ideal formats depending on the exact use case. Let's take the case of dividing [1,2,3, 11, 12, 13] into two bins.

  • inner breakpoints: [7]
  • inner and outer breakpoints: [1, 7, 13]
  • bins with tightly bounded min and max: [[1, 3], [11, 13]]
  • contiguous bins: [[1, 7], [7, 13]] (where values are min <= val < max)

For instance, choropleths and their legends are probably easiest to generate from the contiguous bin format, since each bin corresponds to exactly one colour.

So maybe:

  • ckmeansBins(): contiguous bins
  • ckmeansBreaks(): internal and external breaks, like equalIntervalBreaks. (Personally, I don't really get why you want the min and max there but it seems to be the convention.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants