Skip to content
ken farmer edited this page Jul 13, 2016 · 4 revisions

Example Registries can be found on hadoopinspector_plugins

They are basically just config files for the runner that map reusable checks against specific sets of data (usually tables and columns).

Concepts exposed in the registry include:

  • table/directory/file name: *
  • rule_type:
    • Can be either 'check' or 'setup'.
    • Rule names must also start with this rule type.
    • Setups are run before checks.

Here's an example of a part of a registry that determines how two checks will be run against a table:

{"facts_netflow_sen_dir_proto_hour_v": {
    "setup_table_nonpart": { 
	    "check_type": "setup",
    "check_name": "setup_table_nonpart.py",
    "check_mode": "incremental" },
    "rule_fk_sensorid": {
    "check_name": "rule_table_part_fk.py",
    "check_mode": "incremental",
    "hapinsp_checkcustom_child_col":    "sensor_id", 
    "hapinsp_checkcustom_parent_table": "dim_comm_sensor",
    "hapinsp_checkcustom_parent_col":   "sensor_id"  } }

In the above example you see a table (facts_netflow_sen_dir_proto_hour_v), a setup check that will be run, followed by a rule check. The setup check will be given the check_mode from this registry, and will identify the next partition to run against, and return to the runner a variety of fields, including: data start & stop timestamps, mode and status. If the setup returns a mode of 'inactive' then the runner will skip execution of the rest of the checks. This would happen in the case that there's no new data to check. The rest of these variables will be passed to the rules - for the rules to use in processing incrementally if they are able.

The rule check will also be provided other variables by the runner, including all the hapinsp_checkcustom fields.