Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Get barcodes from Fastx function #59

Open
hotsoupisgood opened this issue Feb 2, 2023 · 1 comment
Open

[Feature Request] Get barcodes from Fastx function #59

hotsoupisgood opened this issue Feb 2, 2023 · 1 comment

Comments

@hotsoupisgood
Copy link

hotsoupisgood commented Feb 2, 2023

Hi, I would like to get barcode counts using the Fastx function.
If this already is an option please let me know.

Right now I am calling:
for currentRead in pyfastx.Fastx(fastqFile): ...
To iterate through rows to get some statistics, but I would also like barcode counts. With normal python code I do it like so, but its quite slow:

barcodes = {}
with gzip.open(myFastq) as fastq:
        for line in fastq:
                if not line.startswith(b'@'): continue
                bc = line.decode("utf-8").split(':')[-1].strip()
                # print(bc)
                if bc not in barcodes:
                        barcodes[bc] = 1
                else:
                        barcodes[bc]+=1

Fastx has sped up some of my other data collection functions so I was hopeful it could do this too!
Thank you

@lmdu
Copy link
Owner

lmdu commented Feb 9, 2023

You can use comment=True option to get the header line content after first white space in which you may find the barcode.

fq = pyfastx.Fastx(fastqFile, comment=True)
for name,seq,qual,comment in fq:
    #use split to get barcode from comment variable like this
    bc = comment.split(':')[-1]

Hope this can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants