Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: invalid literal for int() with base 10: '.' #142

Open
YanCheer opened this issue Jan 26, 2024 · 2 comments
Open

ValueError: invalid literal for int() with base 10: '.' #142

YanCheer opened this issue Jan 26, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@YanCheer
Copy link

When I run toga.py, it showed up such error info:
微信图片_20240126162103

Is there anything wrong with my bed file or the python version?

Thanks a lot!

@kirilenkobm
Copy link
Member

Hello,

seems like the bed file is corrupted.

Pls try something like:

with open("YOUR_BED_FILE", "r") as f:
    line_number = 1
    for line in f:
        fields = line.split("\t")
        if len(fields) >= 6:
            field_6 = fields[5]
            try:
                int(field_6)
            except ValueError:
                print(f"Line {line_number}: {line.strip()}")
        line_number += 1

to identify which line exactly is corrupted

@YanCheer
Copy link
Author

YanCheer commented Jan 27, 2024

Thank you very much for the reply at yesterday.
After generating the new bed and chain files, I got a new error that I cannot settle down today.

At the STEP 3: Merge step 2 output, the running log showed that the pipeline seemed to successfully read but failed to load
and combine all the results of transcript annotation, so then generated the empty merge_chains_output output results as follows:

STEP 3: Merge step 2 output

Reading /projects/yanchen/HiFi_tmp/01.HiFi_Assemble/Assemblies_hifiasm/02.Primary_Assembly_Annotation/TOGA/03.toga_outputs_Run2/TOGA_GRCg7b_Achr1/temp/toga_filt_ref_annot.bed
merge_chains_output: got data for 44858 transcripts
merge_chains_output: Loading the results...
merge_chains_output: There are 0 result files to combine
merge_chains_output: got 0 keys in chain_genes_data
merge_chains_output: got 0 keys in chain_raw_data
merge_chains_output: There were 0 transcript lines and 0 chain lines
merge_chains_output: chain_genes_data dict reverted, there are 0 keys now
merge_chains_output: Combining the data...
merge_chains_output: got combined dict with 0 keys
merge_chains_output: Writing output to /projects/yanchen/02.Primary_Assembly_Annotation/TOGA/03.toga_outputs_Run2/temp/chain_results_df.tsv
merge_chains_output: total runtime: 0:00:40.594500

Therefore, at the following step, it reported an ERROR info:

STEP 4: Classify chains using gradient boosting model

Classifying chains
classify_chains: loaded dataframe of size 0
classify_chains: total number of transcripts: 0
classify_chains: 0 rows with spanning chains
classify_chains: filtered dataset contains 0 records
classify_chains: omputing additional features...
classify_chains: WARNING! The final df for classification is empty
classify_chains: df for single-exon model contains 0 records
classify_chains: df for multi-exon model contains 0 records
classify_chains: loading models at /projects/yanchen/softwares/TOGA-1.1.7/models/se_model.dat (SE) and /projects/yanchen/softwares/TOGA-1.1.7/models/me_model.dat (ME)
classify_chains: applying models to SE and ME datasets...
classify_chains: applying -1.0 score to the spanning chains
classify_chains: applying -2.0 score to the processed pseudogene alignments
classify_chains: number of processed pseudogene alignments: 0
classify_chains: arranging the final output
classify_chains: classification result stats:

  • orthologs: 0
  • paralogs: 0
  • spanning chains: 0
  • processed pseudogenes: 0

classify_chains: using 0.5 as a threshold to separate orthologs from paralogs
classify_chains: combining results for 0 individual transcripts
classify_chains: saving the classification to /projects/yanchen/02.Primary_Assembly_Annotation/TOGA/03.toga_outputs_Run2/temp/trans_to_chain_classes.tsv
classify_chains: found no classifiable chains for 0 transcripts
classify_chains: saving these transcripts to: /projects/yanchen/02.Primary_Assembly_Annotation/TOGA/03.toga_outputs_Run2/temp/rejected/classify_chains_rejected.txt
Chain results file /projects/yanchen/02.Primary_Assembly_Annotation/TOGA/03.toga_outputs_Run2/temp/chain_results_df.tsv is empty! Abort.

From the last error info, it showed me the point was at the chain results file chain_results_df.tsv produced by the pipeline.
Do you have any idea where the error is hidden behind?

Thank you very much for provide such a tiny help for me!!!!!

@kirilenkobm kirilenkobm added the bug Something isn't working label Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants