Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include file grouping information in fqstats #73

Open
unode opened this issue Jun 19, 2018 · 0 comments
Open

Include file grouping information in fqstats #73

unode opened this issue Jun 19, 2018 · 0 comments

Comments

@unode
Copy link
Member

unode commented Jun 19, 2018

Using only information contained in a fqstats file it is currently impossible to distinguish between processing pair.1, pair.2 and singles using pairing information paired(..., singles=...) versus treating each file independently fastq(...).

Adding file grouping information could alleviate this issue. Example:

                              SAMPLE
        0:file           pair.1.fq.gz
    0:encoding     Sanger (33 offset)
     0:numSeqs                 737216
0:numBasepairs               73654175
   0:minSeqLen                     50
   0:maxSeqLen                    101
   0:gcContent       0.41184101240696
   0:filegroup                      0   <---
        1:file           pair.2.fq.gz
           ...                    ...
   1:filegroup                      0   <---
        2:file          singles.fq.gz
           ...                    ...
   2:filegroup                      0   <--- all above = paired(..., singles=...)
        3:file processed.pair.1.fq.gz
           ...                    ...
   3:filegroup                      1   <--- new group
           ...                    ...

A similar situation is seen when using load_mocat_sample(...) on a folder that includes multiple pairs/lanes. Here, a variable number of inputs makes parsing the stats file non-trivial.

In this case, and related to #55 (comment) we could treat all the inputs of a sample as the same filegroup.

                              SAMPLE
        0:file  SAMPLE/pairA.1.fq.gz
           ...                   ...
   0:filegroup                     0   <---
        1:file  SAMPLE/pairA.2.fq.gz
           ...                   ...
   1:filegroup                     0   <---
        2:file SAMPLE/singlesA.fq.gz
           ...                   ...
   2:filegroup                     0   <---
        3:file  SAMPLE/pairB.1.fq.gz
           ...                   ...
   3:filegroup                     0   <---
        4:file  SAMPLE/pairB.2.fq.gz
           ...                   ...
   4:filegroup                     0   <---
        5:file SAMPLE/singlesB.fq.gz
           ...                   ...
   5:filegroup                     0   <---
           ...                   ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant