Error with samplesheet.csv with two column headers that have prefix in common #249

aghr · 2024-03-12T22:11:13Z

Description of the bug

Pipeline works fine with samplesheet.csv with two columns with column headers condition and Xcondition2, but it throws error with condition and condition2. All other input files and parameters were identical.

Error shown below.

Command used and terminal output

ERROR ~ Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_TABULARTO
GSEACLS (inv_vs_wt_e125)'                                                                             

Caused by:
  Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_TABULARTOGSEACLS (inv_vs_wt_e125)
` terminated with an error exit status (2)

Command executed:

  cls_file=inv_vs_wt_e125.cls
  
  column_number=$(cat samplesheet.sample_metadata.tsv | head -n 1 | tr '\t' "\n" | grep -En "^conditio
n" | awk -F':' '{print $1}')
  classes=$(tail -n +2 samplesheet.sample_metadata.tsv | awk -F'\t' '{print $'$column_number'}')
  unique_classes=$(echo -e "$classes" | awk '!x[$0]++')
  
  echo -e "$(echo -e "$classes" | wc -l) $(echo -e "$unique_classes" | wc -l) 1" > $cls_file
  echo -e "#$(echo -e "$unique_classes" | tr '\n' ' ')" | sed "s/ $//" >> $cls_file
  echo -e "$classes" | tr '\n' ' ' | sed "s/ $//" >> $cls_file
  echo -e "\n" >> $cls_file
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_TABULARTOGSEACLS":
      bash: $(echo $(bash --version | grep -Eo 'version [[:alnum:].]+' | sed 's/version //'))
  END_VERSIONS

Command exit status:
  2

Command output:
  (empty)

Command error:
  awk: line 2: missing } near end of file

Work dir:
  XXX

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

version 23.10.0 build 5889 (created 15-10-2023 15:07 UTC (17:07 CEST))
Linux desktop
CentOS Linux release 7.7.1908
local execution
with singularity container
version of nfcore diffabundance: 1.4.0

The text was updated successfully, but these errors were encountered:

BEFH · 2024-03-16T22:03:54Z

I had this issue too. Probably can be fixed by changing this:

column_number=\$(cat $samples | head -n 1 | tr '$separator' "\\n" | grep -En "^$variable" | awk -F':' '{print \$1}')

to this:

column_number=\$(cat $samples | head -n 1 | tr '$separator' "\\n" | grep -En "^$variable\$" | awk -F':' '{print \$1}')

in https://github.com/nf-core/differentialabundance/blob/master/modules/nf-core/custom/tabulartogseacls/main.nf

But the file could probably use some fixing up in general. e.g. for that line:

column_number=\$(head -n 1 $samples | tr '$separator' "\\n" | grep -En "^$variable\$" | cut -d: -f1)

I'm also unsure how many backslashes it needs.

BEFH · 2024-03-16T22:26:03Z

Alternatively, you can replace line 30 and 31 with this:

classes=\$(awk -F '$separator' 'NR==1 { for (i=1; i<=NF; i++) if (\$i == $variable) {lnum = i; next}} 1 {print \$lnum}' $samples)

One-liner with no piping.

asp8200 · 2024-03-18T09:05:54Z

Which branch was this error observed on? Is there a simple test nf-cmd to trigger the error?

jenmuell · 2024-03-18T11:40:04Z

To come back to the question of @asp8200, on which branch did you observe the error? I could only recreate the error on the main branch. If you pull the pipeline from the dev branch this should solve the issue.
I could not recreate the error on the dev branch.

BEFH · 2024-03-18T13:07:49Z

Looks like it's already fixed here: https://github.com/nf-core/differentialabundance/blob/dev/modules%2Fnf-core%2Fcustom%2Ftabulartogseacls%2Fmain.nf#L30

But my suggested awk-only replacement of that line and the next might still be more robust. Should I bother with a pull-request or nah?

jenmuell · 2024-03-18T13:26:38Z

Hmm, I'm not familiar with the run time of awk. Could we run in some problems with the for-loop in your on-line awk option? Especially, with large datasets.

BEFH · 2024-03-18T13:32:12Z

It just loops over the fields on the first line of the file, so it's unlikely to be an issue. I suppose I could make a test file with thousands of columns to check, but it seems unnecessary. A tool called Miller is actually much better for this, but it's best to not add any dependencies.

…

On Mon, Mar 18, 2024, 9:27 AM Jennifer Müller ***@***.***> wrote: Hmm, I'm not familiar with the run time of awk. Could we run in some problems with the for-loop in your on-line awk option? Especially, with large datasets. — Reply to this email directly, view it on GitHub <#249 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZ2Z2FEUPKP2QI5VLHRCEDYY3TSLAVCNFSM6AAAAABETB5BTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTHEYDEMZRGM> . You are receiving this because you commented.Message ID: ***@***.***>

WackerO · 2024-03-18T13:43:19Z

@BEFH I will say I'm not an AWK expert, but if your code solution is more robust, do feel free to make a PR!
@jenmuell thanks for looking into this!

aghr added the bug Something isn't working label Mar 12, 2024

aghr changed the title ~~Error with two column headers that have prefix in common~~ Error with samplesheet.csv with two column headers that have prefix in common Mar 12, 2024

jenmuell self-assigned this Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with samplesheet.csv with two column headers that have prefix in common #249

Error with samplesheet.csv with two column headers that have prefix in common #249

aghr commented Mar 12, 2024

BEFH commented Mar 16, 2024

BEFH commented Mar 16, 2024

asp8200 commented Mar 18, 2024

jenmuell commented Mar 18, 2024

BEFH commented Mar 18, 2024

jenmuell commented Mar 18, 2024

BEFH commented Mar 18, 2024 via email

WackerO commented Mar 18, 2024

Error with samplesheet.csv with two column headers that have prefix in common #249

Error with samplesheet.csv with two column headers that have prefix in common #249

Comments

aghr commented Mar 12, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

BEFH commented Mar 16, 2024

BEFH commented Mar 16, 2024

asp8200 commented Mar 18, 2024

jenmuell commented Mar 18, 2024

BEFH commented Mar 18, 2024

jenmuell commented Mar 18, 2024

BEFH commented Mar 18, 2024 via email

WackerO commented Mar 18, 2024