ckMeans return format is inconvenient #273

stevage · 2017-12-19T04:52:13Z

I've used the ckMeans function a couple of times now, and just wanted to report that the output format is pretty inconvenient for my (and, I'm guessing, others') general use case.

What I want:

pass a bunch of values (eg ckmeans([-1, 2, -1, 2, 4, 5, 6, -1, 2, -1], 3))
get an array of breaks, one fewer than groups requested (eg [0, 3])
hence, color code any individual value by comparing it to each of the breaks in turn

ckMeans() instead returns an array of the groups themselves ([[-1, -1, -1, -1], [2, 2, 2], [4, 5, 6]]) which requires a bit of fiddly manipulation to get into a more useful format. I guess the current format would be useful when your visualisation method is "draw all the values in the first group, then the next group, etc", maybe?

Another possible more convenient output format would be an object, mapping value to group:

{
  -1: 0,
  2: 1,
  4: 2,
  5: 2,
  6: 2
}

The text was updated successfully, but these errors were encountered:

andrewharvey · 2018-06-01T22:40:17Z

Came here to say the same thing actually. I'd prefer the same output format as https://simplestatistics.org/docs/#equalintervalbreaks, an array of break points.

tmcw · 2018-06-01T23:32:35Z

Happy to consider an alternative format as a separate function, like ckmeansBreaks. Sounds like something like output.map((values, i) => values[i ? values.length - 1 : 0]) is all that's necessary change the existing format to what's desired?

stevage · 2018-06-07T03:22:26Z

Since raising this originally I now realise that there are several possible ideal formats depending on the exact use case. Let's take the case of dividing [1,2,3, 11, 12, 13] into two bins.

inner breakpoints: [7]
inner and outer breakpoints: [1, 7, 13]
bins with tightly bounded min and max: [[1, 3], [11, 13]]
contiguous bins: [[1, 7], [7, 13]] (where values are min <= val < max)

For instance, choropleths and their legends are probably easiest to generate from the contiguous bin format, since each bin corresponds to exactly one colour.

So maybe:

ckmeansBins(): contiguous bins
ckmeansBreaks(): internal and external breaks, like equalIntervalBreaks. (Personally, I don't really get why you want the min and max there but it seems to be the convention.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ckMeans return format is inconvenient #273

ckMeans return format is inconvenient #273

stevage commented Dec 19, 2017 •

edited

andrewharvey commented Jun 1, 2018

tmcw commented Jun 1, 2018

stevage commented Jun 7, 2018

ckMeans return format is inconvenient #273

ckMeans return format is inconvenient #273

Comments

stevage commented Dec 19, 2017 • edited

andrewharvey commented Jun 1, 2018

tmcw commented Jun 1, 2018

stevage commented Jun 7, 2018

stevage commented Dec 19, 2017 •

edited