Annotating & Importing Refinery Tools
ToolDefinitons are the refinery-platform's solution to a generic model that satisfies the overlapping requirements of the different tools that we envision being a part of our system.
In general, refinery-based tools need to have some knowledge of their:
- Required input files
- The structure of said input files
- Configurable tool Parameters
One of the main focuses while designing the Generic ToolDefinition
model was being able to support more complex Galaxy-based workflows. Galaxy itself already has this functionality, and it is able to model complex workflows through the use of: Galaxy Dataset Collections. In short, DatasetCollections can encapsulate a LIST
of files, a PAIR
of files, or any nested combination of these two types, and run a single tool in parallel for all files referenced in the LIST/PAIR
structure. We now expose this functionality through the refinery-platform tools
api, specifically with the file_relationship
field of a ToolDefinition.
The Galaxy Workflow Annotation is a JSON data structure which must adhere to the following schema:
For Refinery to recognize a Galaxy workflow as a Refinery ToolDefinition of the WORKFLOW
type, one needs to provide a set of simple annotations in the workflow annotation field in the Galaxy workflow editor. The annotation field is listed under “Edit Attributes” on the right side of the workflow editor.
The annotation fields in the Galaxy workflow editor behave slightly differently for workflow-level and tool-level annotations. In order to confirm changes to a workflow-level annotation, move the cursor to the end of the input field and hit the Return key. This is not required in tool-level annotation fields. Be sure to save the workflow after editing an annotation field.
If you want to expose different tool parameters to be configurable by Refinery-Platform users, the parameters specified must match the exact parameter name specified in the tool's xml file. The Galaxy Workflow Step Annotation is a JSON data structure which must adhere to the following schema:
In order to allow Refinery-Platform users to configure Galaxy Workflows Tool's parameters at tool runtime said tool has to be annotated properly.
To access the annotation field for a tool, click on the tool representation in the workflow editor. The annotation field is named “Annotation / Notes”.
- To retrieve the outputs of a Galaxy Workflow execution one will need to asterisk the outputs that they want returned to Refinery in the Galaxy workflow editor:
- Asterisking an output is fairly easy and will look something like this:
-
Visualization ToolDefinitions
are meant to represent an arbitrary visualization tool that resides inside a docker container. See here for examples of properly annotated Visualization tools.
We utilize JSON schema to help validate any incoming Tool annotation data. All Annotations must adhere to their specified schema.
-
- This element must be one of the following enum values:
"WORKFLOW"
"VISUALIZATION"
- This element must be one of the following enum values:
-
- The object is self-referential in nature with each level of nesting satisfying the file_relationship subschema.
-
- The elements of the array must match exactly one of the following
subschemas
:
- The elements of the array must match exactly one of the following
-
- The object is an array with all elements satisfying the parameter subschema.
SubSchemas are smaller reusable schemas that can be referenced within a Schema or SubSchema.
-
name
(string, required)- This
name
field must be a valid Refinery Filetype. A subset of theseFiletypes
can be found here. Otherwise, running theload_tools
management command with an improperly setfiletype
in your annotations, a message with all currently availableFiletypes
will be displayed.
- This
-
name
(string, required) -
value_type
(string, enum, required)- This element must be one of the following enum values:
"PAIR"
"LIST"
- This element must be one of the following enum values:
-
input_files
(array)- The object is an array with all elements satisfying the input_file.
- Additional restrictions:
- Minimum items:
1
- Minimum items:
- Additional restrictions:
- The object is an array with all elements satisfying the input_file.
-
file_relationship
(object, required)- The object must be one of the following types:
- This subschema must satisfy exactly one of the following subschemas:
-
name
(string, required) -
description
(string, required) -
value_type
(string, enum, required)- This element must be one of the following enum values:
"INTEGER"
"FLOAT"
- This element must be one of the following enum values:
-
default_value
(number, required)
-
name
(string, required) -
description
(string, required) -
value_type
(string, enum, required)- This element must be one of the following enum values:
"BOOLEAN"
- This element must be one of the following enum values:
-
default_value
(boolean, required)
-
name
(string, required) -
description
(string, required) -
value_type
(string, enum, required)- This element must be one of the following enum values:
"STRING"
- This element must be one of the following enum values:
-
default_value
(string, required)
-
name
(string, required) -
description
(string, required) -
value_type
(string, enum, required)- This element must be one of the following enum values:
"GENOME_BUILD"
"ATTRIBUTE"
"FILE"
- This element must be one of the following enum values:
-
default_value
(string, required)
galaxy_parameter
(parameter)
-
galaxy_parameter
is an extension of the parameter subschema adding additional fields relevant to Galaxy-based Workflows.-
galaxy_workflow_step
(number, required)
-
-
name
(string, required) -
description
(string, required) -
allowed_filetypes
(array, required)- This array contains objects that satisfy the filetype subschema.
{
"description": "This workflow does really cool things",
"file_relationship": {
"file_relationship": {},
"value_type": "LIST",
"name": "Flat list of N Samples",
"input_files": [
{
"allowed_filetypes": [{"name": "FASTQ"}],
"name": "Input File",
"description": "Input File Description"
}
]
}
}
{
"description": "This is a more complex annotation",
"file_relationship": {
"value_type": "LIST",
"name": "List of Lists",
"file_relationship": {
"value_type": "LIST",
"name": "List of Pairs",
"file_relationship": {
"file_relationship": {},
"value_type": "PAIR",
"name": "Pairs",
"input_files": [
{
"allowed_filetypes": [
{"name": "FASTQ"},
{"name": "BAM"}
],
"name": "Input File A",
"description": "Input File A Description"
},
{
"allowed_filetypes": [
{"name": "FASTQ"},
{"name": "BAM"}
],
"name": "Input File B",
"description": "Input File B Description"
}
]
}
}
}
}
{
"parameters": [
{
"name": "stdout",
"description": "Whether or not to write to stdout.",
"value_type": "BOOLEAN",
"default_value": false
},
{
"name": "exit_code",
"description": "The exit_code for this tool step",
"value_type": "INTEGER",
"default_value": 0
}
]
}
Once you believe you have properly annotated your Tools, it is now time to transform these annotations into ToolDefinition
objects.
This is done by running the load_tools
Django management command:
$ ./manage.py load_tools --workflows
- Create a tool annotation locally or select a tool name from the visualization tool registry
$ ./manage.py load_tools --visualizations <local tool annotation file path | visualization tool registry name>
Considering the freedom we are giving end-users by allowing them to annotate their Tools, we use JSON schemas to validate all incoming annotation data. The
load_tools
command utilizes Django database transactions, and if any user-specified annotation doesn't adhere to the schemas above; noToolDefinitions
are generated, and nothing is committed to the database.
If there is still any confusion, there exists a fairly extensive test suite that may shed some light on any issues.
Administration
- Operations
- Setting Up Galaxy
- Galaxy CloudMan
- Annotating & Importing Refinery Tools
- Batch Import ISA-Tabs
- Backup & Restore
- Google reCAPTCHA v2
Development