Add an optional `cohort` block to science experiments #170

brasic · 2021-12-23T18:23:41Z

(This is the first of several improvements to scientist based on extractions from the GitHub monolith)

This adds the concept of a "cohort" to an experiment result, to enable and encourage bucketed result publishing.

Many experiments operate on data with a very long tail, and the fat part of the distribution can completely wash out notable results in sub-groups with lower frequency. For example, experiment results derived from the data of very large customers often look quite different than the much more common results from the small data, yet the latter might be so much more common as to make the former statistically invisible. Even the use of percentile metrics can't overcome these effects since often the relevant percentiles are very high (above 99-percentile).

To address this issue, this PR adds an optional block to Science::Experiment which should return a "cohort" when called. The cohort is passed the result of the experiment so it can determine the cohort from the context data, whether the result is a mismatch or any of the observation data.

The determined cohort value is available as Scientist::Result#cohort and is intended to be used by the user-defined publication mechanism.

Here's an example of how it might be used to segment the results of an experiment between "large" and "small" users:

science "widget-count" do |experiment|
  experiment.use { user.count_widgets }
  experiment.try { user.fast_count_widgets }
  experiment.cohort { |res| res.control.value > 100 ? "large" : "small" }
end

Many experiments operate on data with a very long tail, and the most frequent part of the distribution can wash out notable results in sub-groups. For example, experiment results derived from the data of very large customers often look quite different than the much more common results from the small data. Even the use of percentile metrics can't overcome these effects since often the relevant percentiles are very high (above 99-percentile). This adds an optional block to Science::Experiment which should return a "cohort" when called. The cohort is passed the result of the experiment so it can determine the cohort from the context data, whether the result is a mismatch or any of the observation data. The determined cohort value is available as `Scientist::Result#cohort` and is intended to be used by the user-defined publication mechanism.

zerowidth · 2022-01-04T22:43:18Z

cohort might be too specific. Since it's adding metadata to an observation, I wonder if metadata and a block that returns a value might allow for more flexible and generic use, e.g.

science "widget-count" do |experiment|
  experiment.use { user.count_widgets }
  experiment.try { user.fast_count_widgets }
  experiment.metadata { |res| { cohort: res.control.value > 100 ? "large" : "small" } }
end

I'm curious what other improvements you had in mind.

Watemlifts · 2022-01-19T01:20:02Z

Codes changed without conflict

brasic force-pushed the cohorts branch from 07b9b6d to 1174575 Compare January 3, 2022 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an optional `cohort` block to science experiments #170

Add an optional `cohort` block to science experiments #170

brasic commented Dec 23, 2021 •

edited

zerowidth commented Jan 4, 2022

Watemlifts commented Jan 19, 2022

Add an optional cohort block to science experiments #170

Are you sure you want to change the base?

Add an optional cohort block to science experiments #170

Conversation

brasic commented Dec 23, 2021 • edited

zerowidth commented Jan 4, 2022

Watemlifts commented Jan 19, 2022

Add an optional `cohort` block to science experiments #170

Add an optional `cohort` block to science experiments #170

brasic commented Dec 23, 2021 •

edited