Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimap commands in the auto-generated draft_comp.sh file resulting in usage errors #27

Open
JohnUrban opened this issue Oct 4, 2022 · 3 comments

Comments

@JohnUrban
Copy link

Hi,

Thanks for the interesting tool. The very first Minimap2 step in the Gala pipeline is throwing usage errors.

This is what my drafts file essentially looks like (paths shortened for simpler appearance here):

draft_01=/selected_asms/canu.fasta
draft_02=/selected_asms/shasta.fasta
draft_03=/selected_asms/wtdbg2.fasta
draft_04=/selected_asms/flye.fasta

As part of the pipeline, it auto-generates the draft_comp.sh file, which looks like this:

mkdir -p preliminary_comparison
cd preliminary_comparison
minimap2 -x asm5 $draft_01 $draft_02 > draft_01vsdraft_02.paf
minimap2 -x asm5 $draft_01 $draft_03 > draft_01vsdraft_03.paf
minimap2 -x asm5 $draft_01 $draft_04 > draft_01vsdraft_04.paf
minimap2 -x asm5 $draft_02 $draft_01 > draft_02vsdraft_01.paf
minimap2 -x asm5 $draft_02 $draft_03 > draft_02vsdraft_03.paf
minimap2 -x asm5 $draft_02 $draft_04 > draft_02vsdraft_04.paf
minimap2 -x asm5 $draft_03 $draft_01 > draft_03vsdraft_01.paf
minimap2 -x asm5 $draft_03 $draft_02 > draft_03vsdraft_02.paf
minimap2 -x asm5 $draft_03 $draft_04 > draft_03vsdraft_04.paf
minimap2 -x asm5 $draft_04 $draft_01 > draft_04vsdraft_01.paf
minimap2 -x asm5 $draft_04 $draft_02 > draft_04vsdraft_02.paf
minimap2 -x asm5 $draft_04 $draft_03 > draft_04vsdraft_03.paf

Since Minimap2 is throwing the usage errors, my gut feeling it that the pipeline intends to export the variables draft_01 ... draft_04 (with the values being the FILE_PATHs associated with them) into the user's environment outside of Python, but that its not working. Otherwise, I'd assume Gala would want to write draft_comp.sh with the paths to those files rather than variable names like "$draft_01". Either way, something is not working correctly.

Any advice is appreciated.

Best,

John

@JohnUrban
Copy link
Author

JohnUrban commented Oct 4, 2022

I modified the following lines in comp_generator.py:

    for base in a:
        for ba in a:
            if base!=ba:
                b.writelines('minimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'.paf\n')
            #else:
            #    c.append('minimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'\n')

.... to include echo statments to see what Minimap2 command is being run.

    for base in a:
        for ba in a:
            if base!=ba:
                b.writelines('echo "minimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'.paf"\nminimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'.paf\n')
            #else:
            #    c.append('echo "minimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'"\nminimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'\n')

The resulting draft_comp.sh file looks like this:

mkdir -p preliminary_comparison
cd preliminary_comparison
echo "minimap2 -x asm5 $draft_01 $draft_02 > draft_01vsdraft_02.paf"
minimap2 -x asm5 $draft_01 $draft_02 > draft_01vsdraft_02.paf
echo "minimap2 -x asm5 $draft_01 $draft_03 > draft_01vsdraft_03.paf"
minimap2 -x asm5 $draft_01 $draft_03 > draft_01vsdraft_03.paf
echo "minimap2 -x asm5 $draft_01 $draft_04 > draft_01vsdraft_04.paf"
minimap2 -x asm5 $draft_01 $draft_04 > draft_01vsdraft_04.paf
echo "minimap2 -x asm5 $draft_02 $draft_01 > draft_02vsdraft_01.paf"
minimap2 -x asm5 $draft_02 $draft_01 > draft_02vsdraft_01.paf
echo "minimap2 -x asm5 $draft_02 $draft_03 > draft_02vsdraft_03.paf"
minimap2 -x asm5 $draft_02 $draft_03 > draft_02vsdraft_03.paf
echo "minimap2 -x asm5 $draft_02 $draft_04 > draft_02vsdraft_04.paf"
minimap2 -x asm5 $draft_02 $draft_04 > draft_02vsdraft_04.paf
echo "minimap2 -x asm5 $draft_03 $draft_01 > draft_03vsdraft_01.paf"
minimap2 -x asm5 $draft_03 $draft_01 > draft_03vsdraft_01.paf
echo "minimap2 -x asm5 $draft_03 $draft_02 > draft_03vsdraft_02.paf"
minimap2 -x asm5 $draft_03 $draft_02 > draft_03vsdraft_02.paf
echo "minimap2 -x asm5 $draft_03 $draft_04 > draft_03vsdraft_04.paf"
minimap2 -x asm5 $draft_03 $draft_04 > draft_03vsdraft_04.paf
echo "minimap2 -x asm5 $draft_04 $draft_01 > draft_04vsdraft_01.paf"
minimap2 -x asm5 $draft_04 $draft_01 > draft_04vsdraft_01.paf
echo "minimap2 -x asm5 $draft_04 $draft_02 > draft_04vsdraft_02.paf"
minimap2 -x asm5 $draft_04 $draft_02 > draft_04vsdraft_02.paf
echo "minimap2 -x asm5 $draft_04 $draft_03 > draft_04vsdraft_03.paf"
minimap2 -x asm5 $draft_04 $draft_03 > draft_04vsdraft_03.paf

I can confirm that those variables are not in the environment so the Minimap2 commands look like this:

minimap2 -x asm5   > draft_01vsdraft_02.paf
minimap2 -x asm5   > draft_01vsdraft_03.paf
minimap2 -x asm5   > draft_01vsdraft_04.paf
minimap2 -x asm5   > draft_02vsdraft_01.paf
minimap2 -x asm5   > draft_02vsdraft_03.paf
minimap2 -x asm5   > draft_02vsdraft_04.paf
minimap2 -x asm5   > draft_03vsdraft_01.paf
minimap2 -x asm5   > draft_03vsdraft_02.paf
minimap2 -x asm5   > draft_03vsdraft_04.paf
minimap2 -x asm5   > draft_04vsdraft_01.paf
minimap2 -x asm5   > draft_04vsdraft_02.paf
minimap2 -x asm5   > draft_04vsdraft_03.paf

I tried sourcing the draft_names_paths.txt file into the global environment before running Gala (e.g. source /path/to/draft_names_paths.txt). That did not help.

To get past this problem, I had to create an entirely new file that has the export command in front of each variable. I called it draft_names_paths-export.txt. It looks essentially like this:

export draft_01=/selected_asms/canu.fasta
export draft_02=/selected_asms/shasta.fasta
export draft_03=/selected_asms/wtdbg2.fasta
export draft_04=/selected_asms/flye.fasta

That is now working, and the first minimap2 command looks like this (from the echo statements I added):

minimap2 -x asm5 /selected_asms/canu.fasta /selected_asms/shasta.fasta > draft_01vsdraft_02.paf

So that leaves me scratching my head. Was there any guidance in the manual about this? If so, I apologize. If not, has anyone ever ran this pipeline successfully? Perhaps people just generally default to the step-by-step method?

Another question might be - should this actually be the code for comp_generator.py:

Current code looks like this:

import os
def comp_generator(genomes,output=os.getcwd()):
    z=list(open(genomes))
    z=filter(lambda i:'=' in i, z)
    a=[]
    for base in z:
        a.append(base.split('=')[0])
    #c=[]
    if output[-1]!='/':
        output=output+'/'
    b=open(output+'draft_comp.sh','w')
    b.writelines('mkdir -p comparison\ncd comparison\n')
    b.writelines(''.join(z))
    for base in a:
        for ba in a:
            if base!=ba:
                b.writelines('minimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'.paf\n')

Should it look like this:

import os
def comp_generator(genomes,output=os.getcwd()):
    z=list(open(genomes))
    z=filter(lambda i:'=' in i, z)
    a={}                                                     ## DICTIONARY INSTEAD OF LIST
    for base in z:
        k,v = base.strip().split('=')                       ## GET KEY VALUE PAIRS
        a[k] = v                                          ## ADD TO DICT
    #c=[]
    if output[-1]!='/':
        output=output+'/'
    b=open(output+'draft_comp.sh','w')
    b.writelines('mkdir -p comparison\ncd comparison\n')
    b.writelines(''.join(z))
    DRAFTS = list(a.keys())                   ## MAKE LIST OF KEYS IN DICT (THESE ARE THE DRAFT NICKNAMES)
    for base in DRAFTS:                         ## ITERATE OVER DRAFTS LIST
        for ba in DRAFTS:                         ## ITERATE OVER DRAFTS LIST
            if base!=ba:
                ## OLD LINE LEFT FOR COMPARISON
                ###b.writelines('minimap2 -x asm5 $'+base+' $'+ba+' > '+base+'vs'+ba+'.paf\n')
                ### USE a[base] and a[ba] TO GET ASSEMBLY PATHS FOR MM2, INSTEAD OF "$" SIGNS AND NICKNAMES; USE NICKNAMES FOR PAF FILE.
                b.writelines('minimap2 -x asm5 '+ a[base] +'  '+ a[ba] + ' > ' + base + 'vs' + ba + '.paf\n')

Seems to me that might give the output you intended.

Otherwise, I will have to assume those variables should have been exported in the global env already (e.g. via sourcing a file similar to draft_names_paths.txt but with export commands). If so, then it ought to be explicitly stated in the guidance on running Gala.

Otherwise still, perhaps there is a python line (that I haven't yet identified though I've looked) that is supposed to automatically export those variable names to the global environment.

Best,

John

@taprs
Copy link

taprs commented Nov 17, 2023

Hi John!

Same here. Minimizing interactions with the source code, I used the following workaround before running the minimap part:

# export draft assembly paths
set -a; . ./draft_names_paths.txt; set +a
# Then
./draft_comp.sh

I am running it step by step (because there is apparently quite some debugging to do and not much support from the dev) and am not in the end yet, but I guess this way the issue will be resolved in the single-command version too.

I saw nice things done with this pipeline by the author, so let's brace ourselves and see what comes out! :)

Cheers,
Nikita

@JohnUrban
Copy link
Author

Check out my fork if you get a chance. I fixed up the python and added a lot of options.
https://github.com/JohnUrban/GALA

Best,

John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants