Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

old Java version on run_test.sh micro ValueError: Chain results file TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort #131

Open
SomePersonSomeWhereInTheWorld opened this issue Dec 13, 2023 · 13 comments
Labels
bug Something isn't working nextflow Issues related to nextflow

Comments

@SomePersonSomeWhereInTheWorld
Copy link

Not really a bug but using an older version of Java results in the below. Perhaps a more graceful Java version detection and/or warning/error message? Newer Java of course works.

#### STEP 4: Classify chains using gradient boosting model

Classifying chains
classify_chains: loaded dataframe of size 0
classify_chains: total number of transcripts: 0
classify_chains: 0 rows with spanning chains
classify_chains: filtered dataset contains 0 records
classify_chains: omputing additional features...
classify_chains: WARNING! The final df for classification is empty
classify_chains: df for single-exon model contains 0 records
classify_chains: df for multi-exon model contains 0 records
classify_chains: loading models at /path/to/me/TOGA/./models/se_model.dat (SE) and /path/to/me/TOGA/./models/me_model.dat (ME)
classify_chains: applying models to SE and ME datasets...
classify_chains: applying -1.0 score to the spanning chains
classify_chains: applying -2.0 score to the processed pseudogene alignments
classify_chains: number of processed pseudogene alignments: 0
classify_chains: arranging the final output
classify_chains: classification result stats:
* orthologs: 0
* paralogs: 0
* spanning chains: 0
* processed pseudogenes: 0
classify_chains: using 0.5 as a threshold to separate orthologs from paralogs
classify_chains: combining results for 0 individual transcripts
classify_chains: saving the classification to /path/to/me/TOGA/micro_test_out/temp/trans_to_chain_classes.tsv
classify_chains: found no classifiable chains for 0 transcripts
classify_chains: saving these transcripts to: /path/to/me/TOGA/micro_test_out/temp/rejected/classify_chains_rejected.txt
Chain results file /path/to/me/TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort.
Traceback (most recent call last):
  File "/path/to/me/TOGA/./toga.py", line 1742, in <module>
    main()
  File "/path/to/me/TOGA/./toga.py", line 1738, in main
    toga_manager.run()
  File "/path/to/me/TOGA/./toga.py", line 621, in run
    self.__classify_chains()
  File "/path/to/me/TOGA/./toga.py", line 847, in __classify_chains
    check_chains_classified(self.chain_results_df)
  File "/path/to/me/TOGA/modules/sanity_check_functions.py", line 169, in check_chains_classified
    raise ValueError(msg)
ValueError: Chain results file /path/to/me/TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort

Edit: I do get this error message from the 2nd test on the page:

./toga.py test_input/hg38.mm10.chr11.chain test_input/hg38.genCode27.chr11.bed test_input/hg38.2bit test_input/mm10.2bit --kt --pn test -i supply/hg38.wgEncodeGencodeCompV34.isoforms.txt --nc /path/to/me/TOGA/nextflow_config_files --cb 3,5 --cjn 500 --u12 supply/hg38.U12sites.tsv --ms

#### STEP 4: Classify chains using gradient boosting model

Classifying chains
classify_chains: loaded dataframe of size 0
classify_chains: total number of transcripts: 0
classify_chains: 0 rows with spanning chains
classify_chains: filtered dataset contains 0 records
classify_chains: omputing additional features...
classify_chains: WARNING! The final df for classification is empty
classify_chains: df for single-exon model contains 0 records
classify_chains: df for multi-exon model contains 0 records
classify_chains: loading models at /path/to/me/TOGA/./models/se_model.dat (SE) and /path/to/me/TOGA/./models/me_model.dat (ME)
classify_chains: applying models to SE and ME datasets...
classify_chains: applying -1.0 score to the spanning chains
classify_chains: applying -2.0 score to the processed pseudogene alignments
classify_chains: number of processed pseudogene alignments: 0
classify_chains: arranging the final output
classify_chains: classification result stats:
* orthologs: 0
* paralogs: 0
* spanning chains: 0
* processed pseudogenes: 0
classify_chains: using 0.5 as a threshold to separate orthologs from paralogs
classify_chains: combining results for 0 individual transcripts
classify_chains: saving the classification to /path/to/me/TOGA/test/temp/trans_to_chain_classes.tsv
classify_chains: found no classifiable chains for 0 transcripts
classify_chains: saving these transcripts to: /path/to/me/TOGA/test/temp/rejected/classify_chains_rejected.txt
Chain results file /path/to/me/TOGA/test/temp/chain_results_df.tsv is empty! Abort.
Traceback (most recent call last):
  File "/path/to/me/TOGA/./toga.py", line 1742, in <module>
    main()
  File "/path/to/me/TOGA/./toga.py", line 1738, in main
    toga_manager.run()
  File "/path/to/me/TOGA/./toga.py", line 621, in run
    self.__classify_chains()
  File "/path/to/me/TOGA/./toga.py", line 847, in __classify_chains
    check_chains_classified(self.chain_results_df)
  File "/path/to/me/TOGA/modules/sanity_check_functions.py", line 169, in check_chains_classified
    raise ValueError(msg)
ValueError: Chain results file /path/to/me/TOGA/test/temp/chain_results_df.tsv is empty! Abort.

TOGA/test/temp/chain_results_df.tsv 
gene	gene_overs	chain	synt	gl_score	gl_exo	chain_len	exon_qlen	loc_exo	exon_cover	intr_cover	gene_len	ex_num	ex_fract	intr_fract	flank_cov
@lynnyummy
Copy link

lynnyummy commented Dec 21, 2023

Hi, I have a similar error of the chain_results_df.tsv

####` Initiating TOGA class ####
Version 1.1.6
Commit: affa067aaf3beeb8c06471fcddf1bb24deff5bc0
Branch: master

Calling cmd:
./modules/chain_score_filter test_input/align_micro_sample.chain 15000 > /nfs/users//TOGA/micro_test_out/temp/genome_alignment.chain

Command finished with exit code 0.
Continue without isoforms file: not provided
Found 1 sequences in /nfs/users//TOGA/test_input/hg38.micro_sample.2bit
Found 1 sequences in /nfs/users//TOGA/test_input/hg38.micro_sample.2bit
Found 2 sequences in /nfs/users//TOGA/test_input/q2bit_micro_sample.2bit
Saving output to /nfs/users//TOGA/micro_test_out
Arguments stored in /nfs/users//TOGA/micro_test_out/project_args.json


#### STEP 0: making chain and bed file indexes

Started chain indexing...
chain_bst_index: indexing 3 chains
chain_bst_index: Saved chain /nfs/users//TOGA/micro_test_out/temp/genome_alignment.chain index to /nfs/users//TOGA/micro_test_out/temp/genome_alignment.bst
Started bed file indexing...
bed_hdf5_index: indexed 3 transcripts


#### STEP 1: Generate extract chain features jobs

Calling cmd:
./split_chain_jobs.py /nfs/users//TOGA/micro_test_out/temp/genome_alignment.chain /nfs/users//TOGA/micro_test_out/temp/toga_filt_ref_annot.bed /nfs/users//TOGA/micro_test_out/temp/toga_filt_ref_annot.hdf5 --log_file /nfs/users//TOGA/micro_test_out/toga_2023_12_21_at_11_46.log --parallel_logs_dir /nfs/users//TOGA/micro_test_out/temp_logs --jobs_num 1 --jobs /nfs/users//TOGA/micro_test_out/temp/chain_classification_jobs --jobs_file /nfs/users//TOGA/micro_test_out/temp/chain_class_jobs_combined --results_dir /nfs/users//TOGA/micro_test_out/temp/chain_classification_results --rejected /nfs/users//TOGA/micro_test_out/temp/rejected/SPLIT_CHAIN_REJ.txt

split_chain_jobs: Use bed file /nfs/users//TOGA/micro_test_out/temp/toga_filt_ref_annot.bed and chain file /nfs/users//TOGA/micro_test_out/temp/genome_alignment.chain
split_chain jobs: the run data overview is:

* vv: False
* jobs: /nfs/users//TOGA/micro_test_out/temp/chain_classification_jobs
* results_dir: /nfs/users//TOGA/micro_test_out/temp/chain_classification_results
* errors_dir: None
* chain_file: /nfs/users//TOGA/micro_test_out/temp/genome_alignment.chain
* bed_file: /nfs/users//TOGA/micro_test_out/temp/toga_filt_ref_annot.bed
* index_file: /nfs/users//TOGA/micro_test_out/temp/genome_alignment.chain_ID_position
* job_size: None
* jobs_num: 1
* bed_index: /nfs/users//TOGA/micro_test_out/temp/toga_filt_ref_annot.hdf5
* jobs_file: /nfs/users//TOGA/micro_test_out/temp/chain_class_jobs_combined
* ref: hg38
* on_cluster: True
split_chain_jobs: searching for intersections between reference transcripts and chains
split_chain_jobs: chains-to-transcripts dict contains 2 records
split_chain_jobs: skipped 0 transcripts that do not intersect any chain
split_chain_jobs: preparing 2 commands
split_chain_jobs: command size of 3 for each cluster job
split_chain_jobs: results in 1 cluster jobs
split_chain_jobs: estimated time: 0:00:00.013618
Command finished with exit code 0.


#### STEP 2: Extract chain features: parallel step

Extracting chain features, project name: chain_feats__micro_test_out_at_1703159205
Project path: ./nextflow_logs/chain_feats__micro_test_out_at_1703159205
Selected parallelization strategy: nextflow
Parallel manager: pushing job nextflow /nfs/users//TOGA/execute_joblist.nf --joblist /nfs/users//TOGA/micro_test_out/temp/chain_class_jobs_combined
Logs from individual chain runner jobs are show below


#### STEP 3: Merge step 2 output

Reading /nfs/users//TOGA/micro_test_out/temp/toga_filt_ref_annot.bed
merge_chains_output: got data for 3 transcripts
merge_chains_output: Loading the results...
merge_chains_output: There are 0 result files to combine
merge_chains_output: got 0 keys in chain_genes_data
merge_chains_output: got 0 keys in chain_raw_data
merge_chains_output: There were 0 transcript lines and 0 chain lines
merge_chains_output: chain_genes_data dict reverted, there are 0 keys now
merge_chains_output: Combining the data...
merge_chains_output: got combined dict with 0 keys
merge_chains_output: Writing output to /nfs/users//TOGA/micro_test_out/temp/chain_results_df.tsv
merge_chains_output: total runtime: 0:00:03.469617


#### STEP 4: Classify chains using gradient boosting model

Classifying chains
classify_chains: loaded dataframe of size 0
classify_chains: total number of transcripts: 0
classify_chains: 0 rows with spanning chains
classify_chains: filtered dataset contains 0 records
classify_chains: omputing additional features...
classify_chains: WARNING! The final df for classification is empty
classify_chains: df for single-exon model contains 0 records
classify_chains: df for multi-exon model contains 0 records
classify_chains: loading models at ./models/se_model.dat (SE) and ./models/me_model.dat (ME)
classify_chains: applying models to SE and ME datasets...
classify_chains: applying -1.0 score to the spanning chains
classify_chains: applying -2.0 score to the processed pseudogene alignments
classify_chains: number of processed pseudogene alignments: 0
classify_chains: arranging the final output
classify_chains: classification result stats:
* orthologs: 0
* paralogs: 0
* spanning chains: 0
* processed pseudogenes: 0
classify_chains: using 0.5 as a threshold to separate orthologs from paralogs
classify_chains: combining results for 0 individual transcripts
classify_chains: saving the classification to /nfs/users//TOGA/micro_test_out/temp/trans_to_chain_classes.tsv
classify_chains: found no classifiable chains for 0 transcripts
classify_chains: saving these transcripts to: /nfs/users//TOGA/micro_test_out/temp/rejected/classify_chains_rejected.txt
Chain results file /nfs/users//TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort.

I don't know if you figure out this issue?
Thanks!

@guo-cheng
Copy link

I also have this problem. Do you know how to solve this?

@lynnyummy
Copy link

I also have this problem. Do you know how to solve this?

not yet....
have you sorted it out?

@MichaelHiller
Copy link
Collaborator

@kirilenkobm Can you pls look into this?
What I don't understand is what Java has to do with any of this, as TOGA doesn't use Java??

@kirilenkobm kirilenkobm added bug Something isn't working nextflow Issues related to nextflow labels Jan 12, 2024
@kirilenkobm
Copy link
Member

kirilenkobm commented Jan 12, 2024

@MichaelHiller @lynnyummy @SomePersonSomeWhereInTheWorld @guo-cheng
The Chain results file /nfs/users//TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort. message is misleading -> in fact, nextflow does not even start the jobs. However, it does not return any non-0 error code, so TOGA crashes due to a bit general sanity check.
Nextflow indeed requires Java, and, of course, some pairs of versions are incompatible :(.
Please try downgrading nextflow. I'd recommend trying 23.10.0

@SomePersonSomeWhereInTheWorld
Copy link
Author

@MichaelHiller @lynnyummy @SomePersonSomeWhereInTheWorld @guo-cheng The Chain results file /nfs/users//TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort. message is misleading -> in fact, nextflow does not even start the jobs. However, it does not return any non-0 error code, so TOGA crashes due to a bit general sanity check. Nextflow indeed requires Java, and, of course, some pairs of versions are incompatible :(. Please try downgrading nextflow. I'd recommend trying 23.10.0

Already using that:

nextflow -v
nextflow version 23.10.0.5889

Same results:

classify_chains: using 0.5 as a threshold to separate orthologs from paralogs
classify_chains: combining results for 0 individual transcripts
classify_chains: saving the classification to /path/to/me/TOGA/test/temp/trans_to_chain_classes.tsv
classify_chains: found no classifiable chains for 0 transcripts
classify_chains: saving these transcripts to: /path/to/me/TOGA/test/temp/rejected/classify_chains_rejected.txt
Chain results file /path/to/me/TOGA/test/temp/chain_results_df.tsv is empty! Abort.
Traceback (most recent call last):
  File "/path/to/me/TOGA/toga.py", line 1742, in <module>
    main()
  File "/path/to/me/TOGA/toga.py", line 1738, in main
    toga_manager.run()
  File "/path/to/me/TOGA/toga.py", line 621, in run
    self.__classify_chains()
  File "/path/to/me/TOGA/toga.py", line 847, in __classify_chains
    check_chains_classified(self.chain_results_df)
  File "/path/to/me/TOGA/modules/sanity_check_functions.py", line 169, in check_chains_classified
    raise ValueError(msg)
ValueError: Chain results file /path/to/me/TOGA/test/temp/chain_results_df.tsv is empty! Abort.
rk3199@g029:~/TOGA$ ls  /path/to/me/TOGA/test/temp/chain_results_df.tsv
/path/to/me/TOGA/test/temp/chain_results_df.tsv

the .tsv file does have:

chain_results_df.tsv
gene	gene_overs	chain	synt	gl_score	gl_exo	chain_len	exon_qlen	loc_exo	exon_cover	intr_cover	gene_len	ex_num	ex_fract	intr_fract	flank_cov
java --version
java 20.0.1 2023-04-18

@kirilenkobm kirilenkobm pinned this issue Jan 12, 2024
@lynnyummy
Copy link

lynnyummy commented Jan 17, 2024

@MichaelHiller @lynnyummy @SomePersonSomeWhereInTheWorld @guo-cheng The Chain results file /nfs/users//TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort. message is misleading -> in fact, nextflow does not even start the jobs. However, it does not return any non-0 error code, so TOGA crashes due to a bit general sanity check. Nextflow indeed requires Java, and, of course, some pairs of versions are incompatible :(. Please try downgrading nextflow. I'd recommend trying 23.10.0

Hi, I've checked my Nextflow is 23.10.0 and the Java is openjdk 21-internal 2023-09-19.

I checked those error files again, I think probably it is not enough memory for Java,
N E X T F L O W ~ version 23.10.0 There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 1424 bytes for AllocateHeap

Can we modify the run_test.sh and specify Java in the script? any suggestion?

thanks!

@kirilenkobm
Copy link
Member

@lynnyummy

The error message is quite misleading - it appears that the problem might be related to Java's heap memory settings rather than a lack of physical memory (it failed to allocate just 1424 bytes, it's nothing).
Pls try to increase Java heap size by doing
export NXF_OPTS='-Xms1g -Xmx4g'

@lynnyummy
Copy link

export NXF_OPTS='-Xms1g -Xmx4g

Hi, I tried to put this at the end of the command, but still failed and reported the same error.
TOGA/micro_test_out/temp/chain_results_df.tsv is empty! Abort.

@kirilenkobm
Copy link
Member

Seems like I should give this a push #104
so that nextflow logs are carefully arranged and we can analyse what exactly is going on

@lynnyummy
Copy link

lynnyummy commented Mar 4, 2024

@lynnyummy

The error message is quite misleading - it appears that the problem might be related to Java's heap memory settings rather than a lack of physical memory (it failed to allocate just 1424 bytes, it's nothing). Pls try to increase Java heap size by doing export NXF_OPTS='-Xms1g -Xmx4g'

I sorted out this issue, just specify the memory in my own script.

@lynnyummy
Copy link

lynnyummy commented Mar 5, 2024

Hi I got the same error for the second test with the human and mouse datasets. I checked the log file, and indicated that
`Mar-05 15:41:35.927 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'execute_jobs (2)'

Caused by:
java.io.IOException: Cannot run program "sbatch" (in directory "~/TOGA/nextflow_logs/work/15/f2b3a7ba56330223cecf3946904a73"): error=2, No such file or directory

Command executed:

sbatch .command.run

Command exit status:

Command output:
(empty)

Work dir:
~/TOGA/nextflow_logs/work/15/f2b3a7ba56330223cecf3946904a73

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line
Mar-05 15:41:35.928 [Task submitter] DEBUG nextflow.Session - Session aborted -- Cause: Error submitting process 'execute_jobs (2)' for execution
Mar-05 15:41:35.944 [main] DEBUG nextflow.Session - Session await > all processes finished
Mar-05 15:41:35.944 [main] DEBUG nextflow.Session - Session await > all barriers passed
Mar-05 15:41:35.945 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm) - terminating tasks monitor poll loop
Mar-05 15:41:35.949 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=217; ignoredCount=0; cachedCount=0; pendingCount=317; submittedCount=0; runningCount=-217; retriesCount=217; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=-217; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
Mar-05 15:41:36.158 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Mar-05 15:41:36.174 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye`

Seems like I should give this a push #104 so that nextflow logs are carefully arranged and we can analyse what exactly is going on

It can still generate the file genome_alignment.chain, but not chain_results_df.tsv. Can you help to check the issue?

@MichaelHiller
Copy link
Collaborator

Sorry, but where does the Java come from? TOGA is not based on any Java code.
If the resume is problematic, could you try a fresh run from the start, providing enough memory?
@kirilenkobm Could you pls have a look? Apparently it is problematic to get TOGA running on the Sanger compute system.

Thx !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working nextflow Issues related to nextflow
Projects
None yet
Development

No branches or pull requests

5 participants