Cromwell workflow engine support #825

tom-dyar · 2018-06-21T15:11:44Z

The current wdl files were developed with DNANexus in mind. I am trying to modify and use in Cromwell, and it seems there are differences in the way paths for sub-workflows are handled vs. DNANexus. I got it to "work" by putting all the tasks and workflows into a single directory.

dpark01 · 2018-06-21T15:16:08Z

Hi @tom-dyar , yes that's exactly what we do too. It seems that paths for sub-workflows often have shifting interpretations, and I think the DNAnexus parser has changed on this at one point as well. For now, I think the pipes/WDL directory was intended as a set of source files that might need manipulation prior to use, but we may end up flattening that directory structure in the end.

tom-dyar · 2018-06-21T15:32:14Z

Great, thanks! I am hoping that's all there is regarding compatibility, so great you have it on your radar.

tom-dyar · 2018-07-16T15:58:40Z

@dpark01 -- I am now trying to get this running on google compute from Cromwell -- do you have configuration files (machine requirements and reference file locations) for Google cloud, similar to those dx-**.json files in pipes/WDL ? I bumped up the local disk to 2TB due to a large run I tried, but still Kraken never finished after 24 hours, probably due to RAM issue working with my NextSeq500 run...

Thanks,
Tom

dpark01 · 2018-07-16T16:05:39Z

@tom-dyar here is a json config file that we use (though see #843 for some caveats about it). The machine requirement specs should be fully derivable from the WDL task runtimes (in fact they were primarily designed around GCP instances in mind, with the dx_instance_type specifying the AWS/DNAnexus ones separately). But the config json defines a default disk setup that is important to get it to work (two LOCAL disks and a larger bootDiskSizeGb).

As for default databases, I don't have them all linked in properly, and they might not be the latest versions, but see gs://sabeti-public/meta_dbs and /depletion_dbs.

tom-dyar · 2018-08-17T17:45:35Z

OK, not sure if I should submit a new ticket or not...

SamToFastQ is "hanging" when running demux_plus.wdl, so kraken.py never completes. I have a couple 5-7 GB BAM files, so it is failing on one of them. I am using the new Google Pipelines API version v2alpha1 and Cromwell verstion 34. I have bumped up the disk sizes, so I have 2 local disks 500GB each and the boot disk is 100GB. Below is my configuration file. I wonder how I should debug this, since there is no output in the log files, and perhaps there is a picard VERBOSITY option I could set but it seems I would have to update the container to put that in.

Thanks for any help!

include required(classpath("application"))

# Add customizations
#webservice.port = 8090


#MYSQL_DATABASE=cromwell_db -e MYSQL_USER=cromwell -e MYSQL_PASSWORD=cromwell

database {
  db.url = "jdbc:mysql://mysql-db/cromwell_db?useSSL=false&rewriteBatchedStatements=true"
  db.user = "cromwell"
  db.password = "cromwell"
  db.driver = "com.mysql.jdbc.Driver"
  profile = "slick.jdbc.MySQLProfile$"
}

google {

  application-name = "cromwell"

  auths = [
    {
      name = "application-default"
      scheme = "application_default"
    }
  ]
}

engine {
  filesystems {
    gcs {
      auth = "application-default"
      project = "014A9F-BB2CDD-822772"
    }
  }
}

backend {
  default = "Jes"
  providers {
    Jes {
      actor-factory = "cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory"
      config {
        // Google project
        project = "atvirology"

        // Base bucket for workflow executions
        root = "gs://atvir-cromwell/cromwell-execution"
        genomics-api-queries-per-100-seconds = 1000

        // Polling for completion backs-off gradually for slower-running jobs.
        // This is the maximum polling interval (in seconds):
        maximum-polling-interval = 300

        // Optional Dockerhub Credentials. Can be used to access private docker images.
        dockerhub {
          // account = ""
          // token = ""
        }

        genomics {
          // A reference to an auth defined in the `google` stanza at the top.  This auth is used to create
          // Pipelines and manipulate auth JSONs.
          auth = "application-default"
          // Endpoint for APIs, no reason to change this unless directed by Google.
          endpoint-url = "https://genomics.googleapis.com/"
          // Restrict access to VM metadata. Useful in cases when untrusted containers are running under a service
          // account not owned by the submitting user
          restrict-metadata-access = false
          // This allows you to use an alternative service account to launch jobs, by default uses default service account
          compute-service-account = "default"

          // Pipelines v2 only: specify the number of times localization and delocalization operations should be attempted
          // There is no logic to determine if the error was transient or not, everything is retried upon failure
          // Defaults to 3
          localization-attempts = 3
        }

        filesystems {
          gcs {
            // A reference to a potentially different auth for manipulating files via engine functions.
            auth = "application-default"
            project = "014A9F-BB2CDD-822772"

            caching {
              // When a cache hit is found, the following duplication strategy will be followed to use the cached outputs
              // Possible values: "copy", "reference". Defaults to "copy"
              // "copy": Copy the output files
              // "reference": DO NOT copy the output files but point to the original output files instead.
              //              Will still make sure than all the original output files exist and are accessible before
              //              going forward with the cache hit.
              duplication-strategy = "copy"
            }
          }
        }

        default-runtime-attributes {
          cpu: 2
          memory: "4 GB"
          failOnStderr: false
          continueOnReturnCode: 0
          bootDiskSizeGb: 100
          # Allowed to be a String, or a list of Strings NB, was "LOCAL" instead of "HDD"
          disks: "local-disk 500 HDD, /mnt/tmp 500 HDD"
          noAddress: false
          preemptible: 1
          zones: [ "us-central1-a", "us-central1-b", "us-central1-c", "us-east1-b", "us-east1-c", "us-east1-d" ]
        }

        #default-runtime-attributes {
        #  cpu: 2
        #  memory: "15G"
        #  failOnStderr: false
        #  continueOnReturnCode: 0
        #  bootDiskSizeGb: 50
        #  // Allowed to be a String, or a list of Strings
        #  disks: "local-disk 2000 LOCAL, /mnt/tmp 2000 LOCAL"
        #  noAddress: false
        #  preemptible: 1
        #  zones: [ "us-central1-a", "us-central1-b", "us-central1-c", "us-east1-b", "us-east1-c", "us-east1-d" ]
        #}
      }
    }
  }
}

call-caching {
  enabled = true
  invalidate-bad-cache-results = true
}

dpark01 · 2018-08-17T17:55:05Z

Hi Tom, interesting... you should at least be able to deduce from the stdout/stderr log files (that Cromwell normally produces) for the kraken task which bam file it was processing at the time. And given that you have the input bam files, perhaps you could try reproducing that effect manually by spinning up a GCE VM manually, pulling the docker image, running it interactively (docker run -it --rm quay.io/....) which would give you an interactive shell as root within the container. You can manually run the metagenomics.py kraken on your input bam and watch the output, and since you have root, you could edit the source for more verbosity, but my real guess on this is that it has less to do with Picard and more to do with whatever is consuming its output pipes.

If it's reproducible and if your data isn't sensitive, we'd be happy to look at an example bam file.

tom-dyar · 2018-08-17T18:19:17Z

Thanks @dpark01 -- good tips and I will try to reproduce. Nothing particularly sensitive, here is th path to my logs, I tried to make my buckets publicly readable: gs://atvir-cromwell/cromwell-execution/demux_plus/2021156e-a3c3-45b1-9eb3-9171f70595f4/call-kraken

tom-dyar closed this as completed Jun 21, 2018

dpark01 reopened this Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cromwell workflow engine support #825

Cromwell workflow engine support #825

tom-dyar commented Jun 21, 2018

dpark01 commented Jun 21, 2018

tom-dyar commented Jun 21, 2018

tom-dyar commented Jul 16, 2018

dpark01 commented Jul 16, 2018

tom-dyar commented Aug 17, 2018

dpark01 commented Aug 17, 2018

tom-dyar commented Aug 17, 2018

Cromwell workflow engine support #825

Cromwell workflow engine support #825

Comments

tom-dyar commented Jun 21, 2018

dpark01 commented Jun 21, 2018

tom-dyar commented Jun 21, 2018

tom-dyar commented Jul 16, 2018

dpark01 commented Jul 16, 2018

tom-dyar commented Aug 17, 2018

dpark01 commented Aug 17, 2018

tom-dyar commented Aug 17, 2018