Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple ont reads for one sample-how to prepare --sample file[question] #420

Open
kusandeep opened this issue Jun 8, 2023 · 7 comments
Labels
question Further information is requested

Comments

@kusandeep
Copy link

kusandeep commented Jun 8, 2023

Hi @rpetit3
I have multiple ont reads for one sample and I have multiple samples to process in one go. Could you please see my input file below and suggest me what the issue is..

image

@kusandeep kusandeep added the question Further information is requested label Jun 8, 2023
@rpetit3
Copy link
Member

rpetit3 commented Jun 22, 2023

Hi @kusandeep,

I think it might be easier to manually merge the FASTQs, then build the sheet.

cat /bifo/itmp/....../barcode01/*.fastq.gz > CE239.fastq.gz
cat /bifo/itmp/....../barcode02/*.fastq.gz > CE281.fastq.gz
...

At the moment the bactopia prepare command does not have an option to merge ONT barcodes. I wonder if it should now, what do you think?

Cheers,
Robert

@kusandeep
Copy link
Author

Indeed. I will do that.
Yes having multi-file input option would be useful. In my case, I ran in-house MinION in flow cells after multiplexing 10 samples. The output of that was 10 folders (listed above) each representing multiple fastq.gz files that I wanted to take through Bcatopia.

@kusandeep
Copy link
Author

kusandeep commented Jun 26, 2023

This did work and assembly looks good with
--dragonflye_assembler --run_checkm

@incoherentian
Copy link

At the moment the bactopia prepare command does not have an option to merge ONT barcodes. I wonder if it should now, what do you think?

Cheers, Robert

Integrated concatenation would be super!

@rpetit3
Copy link
Member

rpetit3 commented Apr 4, 2024

Hi @kusandeep and @incoherentian

Is this still of interest? (I really wish this was a built in feature on MinKnow)

@kusandeep
Copy link
Author

I would say yes if it's not too much work. This would make life a bit easier.

@incoherentian
Copy link

My colleagues log SR sample data for hybrid assemblies in excel, and I log my runs' barcodes and sample IDs in google sheets I can download as excel, so I used a script to merge multiple runs' fastq. The fastq have to be the right number of subdirs down and use default barcode## enumeration:

$ cat 4bashpandas_barcodecatrenameloop.sh
#!/bin/bash

# Directory where concatenated files will be stored
output_dir="./barcodecatrename"
mkdir -p "$output_dir"
SAMPLE_RANGE_LOWER=01
SAMPLE_RANGE_UPPER=96

# Use Python with Pandas to read the mapping from barcodes.xlsx
# and store it in a temporary file
python3 - <<'END_PYTHON' > temp_mapping.txt
import pandas as pd

# Read the Excel file
df = pd.read_excel('barcodes.xlsx', header=None)

# Create a dictionary mapping from barcode to SAMPLEID
mapping = dict(zip(df[0], df[1]))

# Write the mapping to a file
for k, v in mapping.items():
    print(f"{k},{v}")
END_PYTHON

# Read the temporary mapping file into a Bash associative array
declare -A barcode_to_sampleid
while IFS=, read -r key value
do
    barcode_to_sampleid[$key]=$value
done < temp_mapping.txt

# Merge files according to the mapping
for i in $(seq -w $SAMPLE_RANGE_LOWER 1 $SAMPLE_RANGE_UPPER)
do
    barcode="barcode$i"
    sampleid=${barcode_to_sampleid[$barcode]}
    if [ ! -z "$sampleid" ]; then
        cat */*/"$barcode"/*.fastq.gz > "$output_dir/${sampleid}.fastq.gz" || echo "concatenate error in i=$i"
    else
        echo "No mapping found for $barcode"
    fi
done

echo "Concatenation complete."

# Remove the temporary file
rm temp_mapping.txt

I think that was the one that worked, accommodating mirrored barcodes across multiple flow cells and making it easier to ensure LR filenames were appropriate for hybrid assembly with SR via bactopia prepare. I think it would be pretty easy for @kusandeep to edit it a bit to work with his old spreadsheet and single flow cell. I also think MinKNOW allowed larger batch sizes at some point and may still, but IIRC there are some reasons to keep it as-is... I just can't remember what they are!

I'd agree that the ability to merge ONT reads in bactopia prepare would probably make life way easier for LR-only assemblies. (Hybrid assemblies would still require renaming or fiddling in the FOFN without an (I think?) unnecessarily gargantuan effort by @rpetit3 to expand the scope of bactopia prepare, but hybrid assemblies are likely to become ever less popular with ever-increasing LR Q-scores anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants