Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional cohort block to science experiments #170

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

brasic
Copy link
Contributor

@brasic brasic commented Dec 23, 2021

(This is the first of several improvements to scientist based on extractions from the GitHub monolith)

This adds the concept of a "cohort" to an experiment result, to enable and encourage bucketed result publishing.

Many experiments operate on data with a very long tail, and the fat part of the distribution can completely wash out notable results in sub-groups with lower frequency. For example, experiment results derived from the data of very large customers often look quite different than the much more common results from the small data, yet the latter might be so much more common as to make the former statistically invisible. Even the use of percentile metrics can't overcome these effects since often the relevant percentiles are very high (above 99-percentile).

To address this issue, this PR adds an optional block to Science::Experiment which should return a "cohort" when called. The cohort is passed the result of the experiment so it can determine the cohort from the context data, whether the result is a mismatch or any of the observation data.

The determined cohort value is available as Scientist::Result#cohort and is intended to be used by the user-defined publication mechanism.

Here's an example of how it might be used to segment the results of an experiment between "large" and "small" users:

science "widget-count" do |experiment|
  experiment.use { user.count_widgets }
  experiment.try { user.fast_count_widgets }
  experiment.cohort { |res| res.control.value > 100 ? "large" : "small" }
end

Many experiments operate on data with a very long tail, and the most
frequent part of the distribution can wash out notable results in
sub-groups.  For example, experiment results derived from the data of
very large customers often look quite different than the much more
common results from the small data.  Even the use of percentile metrics
can't overcome these effects since often the relevant percentiles are
very high (above 99-percentile).

This adds an optional block to Science::Experiment which should return a
"cohort" when called.  The cohort is passed the result of the experiment
so it can determine the cohort from the context data, whether the result
is a mismatch or any of the observation data.

The determined cohort value is available as `Scientist::Result#cohort`
and is intended to be used by the user-defined publication mechanism.
@zerowidth
Copy link
Member

cohort might be too specific. Since it's adding metadata to an observation, I wonder if metadata and a block that returns a value might allow for more flexible and generic use, e.g.

science "widget-count" do |experiment|
  experiment.use { user.count_widgets }
  experiment.try { user.fast_count_widgets }
  experiment.metadata { |res| { cohort: res.control.value > 100 ? "large" : "small" } }
end

I'm curious what other improvements you had in mind.

@Watemlifts
Copy link

Codes changed without conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants