Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index error with plot_mutations.py #148

Open
smorzechowski opened this issue Feb 20, 2024 · 4 comments
Open

Index error with plot_mutations.py #148

smorzechowski opened this issue Feb 20, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@smorzechowski
Copy link

smorzechowski commented Feb 20, 2024

Hello! I am a big fan of this tool, thank you! I want to plot inactivating mutations for various selected transcripts and although I can plot some successfully, others throw an index error.

First, I should note I've had to modify the plot_mutations.py in several places in order to plot anything at all. I had to modify line 711 to append the chain value to the trans_line string: trans_line = line_data[0][2:]+"."+line_data[1], otherwise no lines would be selected from the inact_mut_data.txt file.

    else:  # ordinary text file
        f = open(mut_file, "r")
        trans_related_lines = []
        for line in f:
            if not line.startswith("#"):
                continue
            line_data = line.rstrip().split("\t")
            trans_line = line_data[0][2:]+"."+line_data[1]
            # print(trans_line)
            if trans_line == trans:

Similarly, I had to do the same at line 1025 for trans string trans = defect_line_dat[0]+"."+defect_line_dat[1]:

def generate_filebuffer(mut_lines, human_exons_dct, x, y, drawing_mode=DEFAULT_MODE, verbose=False):
    """Generate the body of our SVG file."""
    print("Generating SVG line") if verbose else None
    curr_projection = None  # name of the current handled species
    curr_sp = None
    speciesdata = {}  # mutation data: {mutationnumber:(exon, start, stop)}
    comp_substrings = []
    filebuffer_substrings = []
    vert_offset = 0
    uniq_projections = set()

    for defect_line, sp_num_tup in mut_lines:
        defect_line_dat = defect_line.rstrip().split("\t")
        trans = defect_line_dat[0]+"."+defect_line_dat[1]
        chain = defect_line_dat[1]

These modifications work with the following files and for some transcripts, such as ENSGALT00000067177.3.23 and NOG_ENSGALT00000004916.2.48. However, for other transcripts, like rna-XM_025148119.1.26806
and rna-XM_025144897.1.149, an index error is thrown:

Traceback (most recent call last):
  File "/n/home09/smorzechowski/bin/TOGA/supply/plot_mutations.py", line 1383, in <module>
    svg_line = make_plot(
  File "/n/home09/smorzechowski/bin/TOGA/supply/plot_mutations.py", line 1349, in make_plot
    filebuffer, up_num, vbox_h = generate_filebuffer(
  File "/n/home09/smorzechowski/bin/TOGA/supply/plot_mutations.py", line 1087, in generate_filebuffer
    exon = exons[2 * exonnumber - 1]
IndexError: list index out of range

I'm not familiar enough with this script, or with python in general, to troubleshoot this further, unfortunately!

Also, if you have any insight as to why the original plot_mutations.py doesn't work with my data, I'd be grateful. The original script doesn't work regardless of whether I use an isoform file and gene id instead of transcript id. Appending the chain number is necessary to select anything from the inact_mut_data.txt file.

Thanks very much for any insight you might have!

@smorzechowski
Copy link
Author

@kirilenkobm
Copy link
Member

Hi @smorzechowski

Thank you for highlighting the issue and providing the code fixes and files to reproduce the issue. When we initially developed the tool, transcript identifiers with numerous dots were not common. Nonetheless, it's clear they shouldn't pose a problem in a properly written script. Once I write an update it will appear in the comments to this issue

@MichaelHiller
Copy link
Collaborator

Hi,

this is a very good point. We never encountered this, because we always strip any .1 or something like that from scaffold names.
I guess the best approach is to check at the beginning whether gene / transcript names contain '.' or replace them with "_" or another character.
Thx for reporting it.

@smorzechowski
Copy link
Author

I see, sounds good, thanks to you both!

@kirilenkobm kirilenkobm added the bug Something isn't working label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants