Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphan recovery option in rare cases causes Salmon to quit abruptly without error #929

Open
gringer opened this issue May 3, 2024 · 0 comments

Comments

@gringer
Copy link

gringer commented May 3, 2024

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?

Salmon (bulk mode)

Describe the bug

For one of our 41 samples, salmon fails (quits, without any substantial output) when using the orphan recovery option (where Salmon attempts to try harder to pair up read alignments when one of the reads in a read pair fails to map properly). Given that it's only related to the orphan recovery option, and only one sample out of 41, I don't expect it'll affect our results in any substantial way, but I'm reporting this bug just in case it exposes other software issues that are more concerning.

To Reproduce
Steps and data to reproduce the behavior:

  • Which version of salmon was used?
    • v1.10.0 (the latest release that had a compiled executable)
  • How was salmon installed (compiled, downloaded executable, through bioconda)?
    • downloaded executable
  • Which reference (e.g. transcriptome) was used?
    • Gencode M34 (GRCm39)
  • Which read files were used?
    • Illumina HiSeq, trimmed using Trimmomatic
  • Which which program options were used?

Working:

./salmon/bin/salmon quant -p 64 --index reference/salmon_index -l ISR -1 merged/1791-${id}_1P.fastq.gz -2 merged/1791-${id}_2P.fastq.gz --validateMappings --seqBias --gcBias --posBias --softclip --allowDovetail --numBootstraps 10 -o mapped/salmon_${id}

Working produced the following file structure:

salmon_03
├── aux_info
│   ├── ambig_info.tsv
│   ├── bootstrap
│   │   ├── bootstraps.gz
│   │   └── names.tsv.gz
│   ├── exp3_pos.gz
│   ├── exp3_seq.gz
│   ├── exp5_pos.gz
│   ├── exp5_seq.gz
│   ├── expected_bias.gz
│   ├── exp_gc.gz
│   ├── fld.gz
│   ├── meta_info.json
│   ├── obs3_pos.gz
│   ├── obs3_seq.gz
│   ├── obs5_pos.gz
│   ├── obs5_seq.gz
│   ├── observed_bias_3p.gz
│   ├── observed_bias.gz
│   └── obs_gc.gz
├── cmd_info.json
├── lib_format_counts.json
├── libParams
│   └── flenDist.txt
├── logs
│   └── salmon_quant.log
└── quant.sf

5 directories, 23 files

Not working:

./salmon/bin/salmon quant -p 64 --index reference/salmon_index -l ISR -1 merged/1791-${id}_1P.fastq.gz -2 merged/1791-${id}_2P.fastq.gz --validateMappings --seqBias --gcBias --posBias --softclip --allowDovetail  --recoverOrphans --numBootstraps 10 -o mapped/salmon_${id}

Not working produced the following file structure:

salmon_03_withRecover
├── aux_info
├── libParams
└── logs
    └── salmon_quant.log

4 directories, 1 file

The file mapped/salmon_03_withRecover/logs/salmon_quant.log has nothing inside it.

Expected behavior

Properly-mapped reads, as demonstrated by the following metadata:

{
    "salmon_version": "1.10.0",
    "samp_type": "bootstrap",
    "opt_type": "vb",
    "quant_errors": [],
    "num_libraries": 1,
    "library_types": [
        "ISR"
    ],
    "frag_dist_length": 1001,
    "frag_length_mean": 158.48833607498765,
    "frag_length_sd": 54.34014977759742,
    "seq_bias_correct": true,
    "gc_bias_correct": true,
    "num_bias_bins": 4096,
    "mapping_type": "mapping",
    "keep_duplicates": false,
    "num_valid_targets": 147493,
    "num_decoy_targets": 61,
    "num_eq_classes": 179681,
    "serialized_eq_classes": false,
    "eq_class_properties": [
        "range_factorized",
        "gzipped"
    ],
    "length_classes": [
        496,
        768,
        1403,
        2707,
        100404
    ],
    "index_seq_hash": "c0bf1b46db288bdf947208ef6410a0ced47fa770ab5284a1b231d958b283728b",
    "index_name_hash": "db38822bce0fbc9a64cfb0b230f58119448d1c82706f1c515f210cccaf4fdf7d",
    "index_seq_hash512": "d683c5132cae8695500566a25eb95c0349427afe1664ac571160337850aa269b634ad444936bd6d35205597c4962636c8fadbcf6406ca409a159b65e5f53c59e",
    "index_name_hash512": "e552bd7a70d98c20ff4cf07a83a5f25d2dafe4a78e3dff92348f3d566c9037ccde0de6d4040625ca065a7484dcb8d668c583822bf5138e1540f61685bc991290",
    "index_decoy_seq_hash": "39d3837ea001def952e79d70003dbba0199cc859b32f26350abfa271a6741167",
    "index_decoy_name_hash": "bd5cd185b9e3272a64108e64e2bc47bc0552046dba3ff53683edeafab750c9ab",
    "num_bootstraps": 10,
    "num_processed": 28233938,
    "num_mapped": 13878036,
    "num_decoy_fragments": 1377519,
    "num_dovetail_fragments": 563891,
    "num_fragments_filtered_vm": 1456279,
    "num_alignments_below_threshold_for_mapped_fragments_vm": 2129372,
    "percent_mapped": 49.153738313089728,
    "call": "quant",
    "start_time": "Fri May 03 11:31:29 2024",
    "end_time": "Fri May 03 11:33:32 2024"
}

Screenshots

Program output from a failed process (with the --recoverOrphans option):

Version Info: This is the most recent version of salmon.
### salmon (selective-alignment-based) v1.10.0
### [ program ] => salmon
### [ command ] => quant
### [ threads ] => { 64 }
### [ index ] => { reference/salmon_index }
### [ libType ] => { ISR }
### [ mates1 ] => { merged/XXXX-03_1P.fastq.gz }
### [ mates2 ] => { merged/XXXX-03_2P.fastq.gz }
### [ validateMappings ] => { }
### [ seqBias ] => { }
### [ gcBias ] => { }
### [ posBias ] => { }
### [ softclip ] => { }
### [ allowDovetail ] => { }
### [ recoverOrphans ] => { }
### [ numBootstraps ] => { 10 }
### [ output ] => { mapped/salmon_03 }
Logs will be written to mapped/salmon_03/logs
[2024-05-03 15:09:51.221] [jointLog] [info] setting maxHashResizeThreads to 64
[2024-05-03 15:09:51.221] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2024-05-03 15:09:51.221] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2024-05-03 15:09:51.221] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
[2024-05-03 15:09:51.221] [jointLog] [info] parsing read library format
[2024-05-03 15:09:51.221] [jointLog] [info] There is 1 library.
[2024-05-03 15:09:51.221] [jointLog] [info] Loading pufferfish index
[2024-05-03 15:09:51.221] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig table | Time = 6.1119 s
-----------------------------------------
size = 25107960
-----------------------------------------
| Loading contig offsets | Time = 29.509 ms
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 163.13 us
-----------------------------------------
-----------------------------------------
| Loading mphf table | Time = 358.06 ms
-----------------------------------------
size = 3025374818
Number of ones: 25107959
Number of ones per inventory item: 512
Inventory entries filled: 49039
-----------------------------------------
| Loading contig boundaries | Time = 3.1166 s
-----------------------------------------
size = 3025374818
-----------------------------------------
| Loading sequence | Time = 237.3 ms
-----------------------------------------
size = 2272136048
-----------------------------------------
| Loading positions | Time = 2.8327 s
-----------------------------------------
size = 2977516968
-----------------------------------------
| Loading reference sequence | Time = 228.26 ms
-----------------------------------------
-----------------------------------------
| Loading reference accumulative lengths | Time = 320.51 us
-----------------------------------------
[2024-05-03 15:10:04.136] [jointLog] [info] done
[2024-05-03 15:10:04.170] [jointLog] [info] Index contained 147554 targets




[2024-05-03 15:10:05.131] [jointLog] [info] Number of decoys : 61   
processed 21000000 fragmentsointLog] [info] First decoy index : 147456
hits: 25885546, hits per frag:  1.2683(base) [**no further output**]

Desktop (please complete the following information):

  • OS: Ubuntu linux
$ uname -a
Linux big-bird 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant