Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove the concept of "reduction_features" #4488

Open
jackgerrits opened this issue Feb 6, 2023 · 4 comments
Open

Proposal: Remove the concept of "reduction_features" #4488

jackgerrits opened this issue Feb 6, 2023 · 4 comments

Comments

@jackgerrits
Copy link
Member

jackgerrits commented Feb 6, 2023

Reduction features was originally added (#2282) as a means to split out the contents of a label that are also required for prediction.

I think this goal has been unmet by reduction_features as they are today and reduction_features have started to be used for other purposes as the general nature of them is attractive as a generic channel.

I propose we remove the concept of reduction_features and propose a short term and long term solution to both the current usages of reduction_features and the general requirement of label state and predict-only state.

The lack of explicit association of the fields in reduction_features means that you can only implicitly know what fields muyst be used to retrieve problem-relevant information. There is no explicit way to determine what is relevant as there is for labels with VW::label_type_t. The fact fields have been added that do not have an associated label type means that retroactively adding this is tricky.

It also makes user code interacting with input examples more confusing because there is both a label and potentially a reduction_features to edit. reduction_features are not well exposed in bindings despite being critical for describing some existing inputs.

Analysis of current usage

There are 6 fields in the reduction_features as of VW 9.7.

ccb_reduction_features _ccb_reduction_features;
continuous_actions::reduction_features _contact_reduction_features;
simple_label_reduction_features _simple_label_reduction_features;
cb_explore_adf::greedy::reduction_features _epsilon_reduction_features;
large_action_space::las_reduction_features _large_action_space_reduction_features;
cb_graph_feedback::reduction_features _cb_graph_feedback_reduction_features;

ccb_reduction_features

This exists in the reduction_features but using it was never implemented. So it can be confusing to see this but the label is still actually used.

continuous_actions::reduction_features

The chosen action and entire PDF can be passed to CA which is used when predicting. I think this information is unused in learn.

simple_label_reduction_features

The base and initial values are in this type which are used for prediction. They were removed from the label type so there is no duplication.

cb_explore_adf::greedy::reduction_features

This contains the epsilon value for this example. This was added to support the epsilon decay reduction. This value is not related to the example but reduction_features are used as a channel to communicate between reductions.

large_action_space::las_reduction_features

This contains several things:

  • Generated interactions
  • A reference to the shared example
  • SquareCB gamma

This is used as a channel to communicate between reductions. Generated interactions are calculated near the base of the stack and because of the work LAS does it needs to know what the interactions are for the given input.

Shared example is used to remove the shared features from an action.

SquareCB gamma is used to generate the prediction.

cb_graph_feedback::reduction_features

This contains the graph edges for a new enhancement to exploration. This is passed in by the user as input. I am unsure if it is predict-only or used in learning too.

Proposal

The communication between reductions relies on tight coupling of producors and consumers in all of these situations.

There are two kind of uses present, problem input information used during prediction and inter-reduction communication.

For kind 1, we can as a stopgap measure go back to putting this information in the label structure for the given label type.

For kind 2, one approach is to create generic publish/subscribe interfaces for this kind of information. It decouples the reductions which need to use it and the content itself which is published. This only really works for global information. I don't have a great suggestion for shared_example used by large action spaces. Potentially LAS should be above shared_feature_merge if this info is necessary. We would need to clearly define the concurrency requirements of the pub/sub architecture. But allowing changes only on learn seems reasonable and allows safe concurrency without locking requirements.

Splitting of label state and predict-only state

For kind 1, a longer term solution is to allow the Example, MultiEx type to be expanded into problem specific types that a reduction can specify. This seems to be a requirement we keep hitting up against. It will allow things like shared, slot, etc to be expressed as first class concepts in the "input type". One drawback is that parsers need to be able to produce more rich "input types" whereas they have managed to be quite generic up until now.

In this world, predict would only need the "input type", and learn would require this in addition to the label.

For example, a contextual bandit input type might looks something along the lines of:

struct CBExample
{
    std::optional<FeatureGroups> shared_features;
    std::vector<FeatureGroups> actions;
    bool have_shared_features_been_merged_into_actions;
}
@peterychang
Copy link
Collaborator

reduction features was always a bit of a hack to get around the few bits of information that didn't abide by the "reductions are wholly self-contained" idea.

We should probably either go in fully on that concept and prohibit reductions from passing reduction-specific information between each other, or give up on the idea of self-contained reductions and formalize a side-channel. We could always split reductions into 2 classes (self-contained vs intertwined), but that just seems icky.

@jackgerrits
Copy link
Member Author

jackgerrits commented Feb 6, 2023

The side channel could be achieved by being more strict with what can be passed and formalizing the allowed things as supported "services" (via Workspace). Which really is just glorified global state with more accountability in who uses it and an interface to allow for runtime checks.

@peterychang
Copy link
Collaborator

Maybe a sort of a "broadcast" pattern. Reductions could subscribe to specific reductions or pieces of data, and those are the only ones they would have access to. The owner of the data would be the only one that could modify the data.

@jackgerrits
Copy link
Member Author

Yeah that sounds good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants