Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo data Error: [E::fai_build3_core] Failed to open the file ref.fasta.gz #28

Open
kiranpatil222 opened this issue Mar 8, 2024 · 7 comments
Labels
question Further information is requested

Comments

@kiranpatil222
Copy link

Ask away!

Hi,

I tried running demo data as given in your ReadMe but got below error
[E::fai_build3_core] Failed to open the file ref.fasta.gz

@kiranpatil222 kiranpatil222 added the question Further information is requested label Mar 8, 2024
@cjalder
Copy link

cjalder commented Mar 8, 2024

Hi @kiranpatil222 - can you provide some more information on exactly how you ran this? In the latest version (v1.2.0) we changed the demo data to run using de novo assembly. The README should reflect this change.

@kiranpatil222
Copy link
Author

kiranpatil222 commented Mar 8, 2024

This is epi2me-labs/wf-bacterial-genomes v1.2.0-g6af5457

nextflow run epi2me-labs/wf-bacterial-genomes \ --fastq wf-bacterial-genomes-demo/isolates_fastq \ --isolates \ --reference_based_assembly \ --reference wf-bacterial-genomes-demo/ref/ref.fasta.gz \ --sample_sheet wf-bacterial-genomes-demo/isolates_sample_sheet.csv \ -profile standard

There is no fasta in Demo data downloaded "wf-bacterial-genomes-demo/ref/ref.fasta.gz" in

Also Do you have any separate tutorial for MLST typing/ AMR from Ref Based Assembly , there is no clarity of different application with some example commands like you have for demo data

@cjalder
Copy link

cjalder commented Mar 8, 2024

Hi @kiranpatil222. The README has the following command to run the new demo data.

nextflow run epi2me-labs/wf-bacterial-genomes \
    --fastq wf-bacterial-genomes-demo/isolates_fastq \
    --isolates \
    --sample_sheet wf-bacterial-genomes-demo/isolates_sample_sheet.csv \
    -profile standard

AMR and MLST can still be run with --isolates mode even with reference based assembly (the command would be much like the one you used), the only thing you would have to ensure is that all samples on the run align to that reference.

I will discuss with the team on adding some more example commands in the next release.

@kiranpatil222
Copy link
Author

kiranpatil222 commented Mar 8, 2024

Now this error

ERROR ~ Error executing process > 'calling_pipeline:deNovo (2)'

Caused by:
  Process `calling_pipeline:deNovo (2)` terminated with an error exit status (1)

Command executed:

  COV_FAIL=0
  FLYE_EXIT_CODE=0
  flye    --nano-hq reads.fastq.gz --out-dir output --threads "3" ||     FLYE_EXIT_CODE=$?

  if [[ $FLYE_EXIT_CODE -eq 0 ]]; then
      mv output/assembly.fasta "./test1.draft_assembly.fasta"
      mv output/assembly_info.txt "./test1_flye_stats.tsv"
      bgzip "test1.draft_assembly.fasta"
  else
      # flye failed --> check the log to check why
      edge_cov=$(
          grep -oP 'Mean edge coverage: \K\d+' output/flye.log             || echo 5
      )
      ovlp_cov=$(
          grep -oP 'Overlap-based coverage: \K\d+' output/flye.log             || echo 5
      )
      if [[
          $edge_cov -lt 5 ||
          $ovlp_cov -lt 5
      ]]; then
          echo -n "Caught Flye failure due to low coverage (either mean edge cov. or "
          echo "overlap-based cov. were below 5)".
          COV_FAIL=1
      elif grep -q "No disjointigs were assembled" output/flye.log; then
          echo -n "Caught Flye failure due to disjointig assembly."
          COV_FAIL=2
      else
          # exit a subshell with error so that the process fails
          ( exit $FLYE_EXIT_CODE )
      fi
  fi

Command exit status:
  1

Command output:
  (empty)

Command error:
  [2024-03-08 11:32:29] INFO: Extending reads
  [2024-03-08 11:33:26] INFO: Overlap-based coverage: 32
  [2024-03-08 11:33:26] INFO: Median overlap divergence: 0.0971861
  0% 80% 90% 100%
  [2024-03-08 11:34:32] INFO: Assembled 2 disjointigs
  [2024-03-08 11:34:32] INFO: Generating sequence
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2024-03-08 11:34:35] INFO: Filtering contained disjointigs
  0% 50% 100%
  [2024-03-08 11:34:37] INFO: Contained seqs: 0
  [2024-03-08 11:34:37] INFO: >>>STAGE: consensus
  [2024-03-08 11:34:37] INFO: Running Minimap2
  [2024-03-08 11:35:11] INFO: Computing consensus
  Process SyncManager-1:
  Traceback (most recent call last):
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
      self.run()
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
      self._target(*self._args, **self._kwargs)
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/managers.py", line 608, in _run_server
      server = cls._Server(registry, address, authkey, serializer)
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/managers.py", line 154, in __init__
      self.listener = Listener(address=address, backlog=16)
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/connection.py", line 448, in __init__
      self._listener = SocketListener(address, family, backlog)
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/connection.py", line 591, in __init__
      self._socket.bind(address)
  OSError: AF_UNIX path too long
  Traceback (most recent call last):
    File "/home/epi2melabs/conda/bin/flye", line 33, in <module>
      sys.exit(load_entry_point('flye==2.9.3', 'console_scripts', 'flye')())
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/flye/main.py", line 756, in main
      _run(args)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/flye/main.py", line 493, in _run
      jobs[i].run()
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/flye/main.py", line 284, in run
      consensus_fasta = cons.get_consensus(out_alignment, self.in_contigs,
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/flye/polishing/consensus.py", line 71, in get_consensus
      mp_manager = multiprocessing.Manager()
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/context.py", line 57, in Manager
      m.start()
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/managers.py", line 583, in start
      self._address = reader.recv()
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/connection.py", line 250, in recv
      buf = self._recv_bytes()
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
      buf = self._recv(4)
    File "/home/epi2melabs/conda/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
      raise EOFError
  EOFError

@cjalder
Copy link

cjalder commented Mar 8, 2024

Hi @kiranpatil222 - oh that's unfortunate. Can you submit this as a full bug report with the information requested on the form, so we can investigate properly.

@kiranpatil222
Copy link
Author

I wonder don't you guys check test it before committing/deploying here in Git unless ONT community report it, and this is at the Demo data level.

@cjalder
Copy link

cjalder commented Mar 8, 2024

Hi @kiranpatil222 - we have numerous CI/CD tests in place before workflows get released, and the demo works fine on a number of our team's machines.
If you can issue the full bug report we can diagnose your particular problem properly, though the error
OSError: AF_UNIX path too long suggests it may be due to the path to the data on your machine being too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants