Skip to content

Annotating & Importing Refinery Tools

Scott Ouellette edited this page May 8, 2018 · 49 revisions

ToolDefinitons are the refinery-platform's solution to a generic model that satisfies the overlapping requirements of the different tools that we envision being a part of our system.

In general, refinery-based tools need to have some knowledge of their:

  • Required input files
  • The structure of said input files
  • Configurable tool Parameters

One of the main focuses while designing the Generic ToolDefinition model was being able to support more complex Galaxy-based workflows. Galaxy itself already has this functionality, and it is able to model complex workflows through the use of: Galaxy Dataset Collections. In short, DatasetCollections can encapsulate a LIST of files, a PAIR of files, or any nested combination of these two types, and run a single tool in parallel for all files referenced in the LIST/PAIR structure. We now expose this functionality through the refinery-platform tools api, specifically with the file_relationship field of a ToolDefinition.

Contents:


Annotating Tools:

Galaxy Workflow Annotations:

The Galaxy Workflow Annotation is a JSON data structure which must adhere to the following schema:

For Refinery to recognize a Galaxy workflow as a Refinery ToolDefinition of the WORKFLOW type, one needs to provide a set of simple annotations in the workflow annotation field in the Galaxy workflow editor. The annotation field is listed under “Edit Attributes” on the right side of the workflow editor.

Note:

The annotation fields in the Galaxy workflow editor behave slightly differently for workflow-level and tool-level annotations. In order to confirm changes to a workflow-level annotation, move the cursor to the end of the input field and hit the Return key. This is not required in tool-level annotation fields. Be sure to save the workflow after editing an annotation field.

Galaxy Workflow Step Annotations:

Note:

If you want to expose different tool parameters to be configurable by Refinery-Platform users, the parameters specified must match the exact parameter name specified in the tool's xml file. The Galaxy Workflow Step Annotation is a JSON data structure which must adhere to the following schema:

In order to allow Refinery-Platform users to configure Galaxy Workflows Tool's parameters at tool runtime said tool has to be annotated properly.

To access the annotation field for a tool, click on the tool representation in the workflow editor. The annotation field is named “Annotation / Notes”.

Exposing Galaxy Workflow outputs to Refinery:

  • To retrieve the outputs of a Galaxy Workflow execution one will need to asterisk the outputs that they want returned to Refinery in the Galaxy workflow editor:
  • Asterisking an output is fairly easy and will look something like this: asterisking

Visualization Tool Annotations:

  • Visualization ToolDefinitions are meant to represent an arbitrary visualization tool that resides inside a docker container. See here for examples of properly annotated Visualization tools.

Schemas:

We utilize JSON schema to help validate any incoming Tool annotation data. All Annotations must adhere to their specified schema.

ToolDefinition

  • name (string, required)

  • description (string, required)

  • tool_type (string, enum, required)

    • This element must be one of the following enum values:
      • "WORKFLOW"
      • "VISUALIZATION"
  • file_relationship (object, required)

    • The object is self-referential in nature with each level of nesting satisfying the file_relationship subschema.
  • parameters (array)

WorkflowStep

  • parameters (array)

    • The object is an array with all elements satisfying the parameter subschema.

Sub-Schemas:

SubSchemas are smaller reusable schemas that can be referenced within a Schema or SubSchema.

filetype (object)

  • name (string, required)
    • This name field must be a valid Refinery Filetype. A subset of these Filetypes can be found here. Otherwise, running the load_tools management command with an improperly set filetype in your annotations, a message with all currently available Filetypes will be displayed.

file_relationship (object)

  • name (string, required)
  • value_type (string, enum, required)
    • This element must be one of the following enum values:
      • "PAIR"
      • "LIST"
  • input_files (array)
    • The object is an array with all elements satisfying the input_file.
      • Additional restrictions:
        • Minimum items: 1
  • file_relationship (object, required)

parameter (object)

number_parameter (object)

  • name (string, required)
  • description (string, required)
  • value_type (string, enum, required)
    • This element must be one of the following enum values:
      • "INTEGER"
      • "FLOAT"
  • default_value (number, required)

boolean_parameter (object)

  • name (string, required)
  • description (string, required)
  • value_type (string, enum, required)
    • This element must be one of the following enum values:
      • "BOOLEAN"
  • default_value (boolean, required)

string_parameter (object)

  • name (string, required)
  • description (string, required)
  • value_type (string, enum, required)
    • This element must be one of the following enum values:
      • "STRING"
  • default_value (string, required)

other_parameter (object)

  • name (string, required)
  • description (string, required)
  • value_type (string, enum, required)
    • This element must be one of the following enum values:
      • "GENOME_BUILD"
      • "ATTRIBUTE"
      • "FILE"
  • default_value (string, required)

galaxy_parameter (parameter)

  • galaxy_parameter is an extension of the parameter subschema adding additional fields relevant to Galaxy-based Workflows.
    • galaxy_workflow_step (number, required)

input_file (object)

  • name (string, required)
  • description (string, required)
  • allowed_filetypes (array, required)
    • This array contains objects that satisfy the filetype subschema.

Examples:

Valid Galaxy Workflow Annotations:

Tool Annotation: Flat list of files

{
  "description": "This workflow does really cool things",
  "file_relationship": {
    "file_relationship": {},
    "value_type": "LIST",
    "name": "Flat list of N Samples",
    "input_files": [
      {
        "allowed_filetypes": [{"name": "FASTQ"}],
        "name": "Input File",
        "description": "Input File Description"
      }
    ]
  }
}

Tool Annotation: LIST:LIST:PAIR

{
    "description": "This is a more complex annotation",
    "file_relationship": {
      "value_type": "LIST",
      "name": "List of Lists",
      "file_relationship": {
        "value_type": "LIST",
        "name": "List of Pairs",
        "file_relationship": {
          "file_relationship": {},
          "value_type": "PAIR",
          "name": "Pairs",
          "input_files": [
            {
              "allowed_filetypes": [
                {"name": "FASTQ"},
                {"name": "BAM"}
              ],
              "name": "Input File A",
              "description": "Input File A Description"
            },
            {
              "allowed_filetypes": [
                {"name": "FASTQ"},
                {"name": "BAM"}
              ],
              "name": "Input File B",
              "description": "Input File B Description"
            }
          ]
        }
      }
    }
  }

Valid Galaxy Workflow Step annotation:

{
	"parameters": [
		{
			"name": "stdout",
			"description": "Whether or not to write to stdout.",
			"value_type": "BOOLEAN",
			"default_value": false
		},
		{
			"name": "exit_code",
			"description": "The exit_code for this tool step",
			"value_type": "INTEGER",
			"default_value": 0
		}
	]
}

Importing Tools into Refinery:

Once you believe you have properly annotated your Tools, it is now time to transform these annotations into ToolDefinition objects.

This is done by running the load_tools Django management command:

Load Workflows

  • $ ./manage.py load_tools --workflows

Load Visualizations

  • Create a tool annotation locally or select a tool name from the visualization tool registry
  • $ ./manage.py load_tools --visualizations <local tool annotation file path | visualization tool registry name>

Note:

Considering the freedom we are giving end-users by allowing them to annotate their Tools, we use JSON schemas to validate all incoming annotation data. The load_tools command utilizes Django database transactions, and if any user-specified annotation doesn't adhere to the schemas above; no ToolDefinitions are generated, and nothing is committed to the database.

If there is still any confusion, there exists a fairly extensive test suite that may shed some light on any issues.

Clone this wiki locally