Skip to content
Eduardo Salinas edited this page Oct 14, 2022 · 5 revisions

When running with --invert_hash <file>, vw will output an extra, human-readable index of the features.

Note that --invert_hash is output-only and needs to be run with an input file in order to be able to map the string features to the internal hash, otherwise will just output the internal hash without the corresponding string name.

Structure

The file is divided into header, metadata, and feature-index sections. Header elements consist of the VW version used to create the model, and the optional model id, which can be embedded into the model using --id <id>. The metadata take the form: <identifier>:<value>, for a number of configuration, statistical, and learning metadata. A few are described below:

Name Description
(Min/Max) label The minimum/maximum values observed by the learner at the scorer level.
bits The bitness of the feature indicies. A larger value here increases the model size, but may be useful in high
options The model-defining arguments

The feature-index section consists of entries of the form <feature_name>:<feature_index>:<weight>[offset]. For example, a feature of the form:

ANamespace^cat_feature=category*BNamespace^num_feature:28:0.19[0]

Can be parsed as:

  • The quadratic (-q) interaction between two features (there is a single *).
  • The first feature is in the ANamespace, with name cat_feature=category.
  • The second feature is in the BNamespace, with name num_feature.
  • The index of the generated quadratic feature is 28
  • The weight corresponding to this feature is 0.19
  • The reduction being used activates interleaved models and the brackets signify which model offset this weight belongs to
Clone this wiki locally