Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest template merge completely broke dev branch (on AWS + Fusion) #392

Open
FelixKrueger opened this issue Mar 27, 2024 · 4 comments · Fixed by #401
Open

Latest template merge completely broke dev branch (on AWS + Fusion) #392

FelixKrueger opened this issue Mar 27, 2024 · 4 comments · Fixed by #401
Labels
bug Something isn't working
Milestone

Comments

@FelixKrueger
Copy link
Contributor

Description of the bug

I am trying to launch a methylseq run using the latest dev branch where some (but not a all) samples require merging of technical replicates before launching. If I understand it correctly, the latest template changes were merged into dev earlier this month, but something seems to have gone awry:

Within seconds of launching the run, I observe the following errors:

  1. The samples are not getting merged, despite technical replicates having identical IDs (which no longer get truncated by 1 element, which is good!)
  2. Trim Galore fails straight away as the system tries to create the same symbolic link several times (details below)
  3. As one of the first processes, bismark2summary is run, and obviously fails...
Screenshot 2024-03-27 at 11 15 21

Obviously, the ln -s command attempts to use the very same filename 6 times over, which doesn't work. But something also screwed up the entire workflow logic, i.e. not starting with merging, and instead running post-run QC right at the start.

Here is an example samplesheet:

sample,fastq_1,fastq_2,genome
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7506206_P3_plus_12_32F_Smith_C_Klf4,s3://filebucket/SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757836_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,
GSM7431885_NB18_32F_TNTtoKSR_553_rep1,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz,s3://filebucket/SRR24757837_GSM7431885_NB18_32F_TNTtoKSR_553_rep1_Homo_sapiens_Bisulfite-Seq_R2.fastq.gz,

Command used and terminal output

This is the command it attempts to run:

Command

[ ! -f  GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz ] && ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz
trim_galore \
    --fastqc \
    --cores 8 \
    --gzip \
    GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz


Terminal output of Trim Galore process:
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
ln: GSM7506206_P3_plus_12_32F_Smith_C_Klf4.fastq.gz: File exists
11:11AM INF shutdown filesystem start
11:11AM INF shutdown filesystem done

Relevant files

No response

System information

Screenshot 2024-03-27 at 11 06 51

I am running this on Seqera platform on AWS, using Fusion. Nextflow v23.10.1 build 5891. nf-core/methylseq version: dev

@FelixKrueger FelixKrueger added the bug Something isn't working label Mar 27, 2024
@edmundmiller
Copy link
Collaborator

Have you tried simplifying the names to just GSM7431885 and GSM7506206

The sample name doesn't have to match the original input, you can name it something more descriptive than an ID as well.

@FelixKrueger
Copy link
Contributor Author

FelixKrueger commented Mar 28, 2024

Simplifying the name has no effect (other than a different file name...):
Screenshot 2024-03-28 at 11 02 08

ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists

But this command can never work:

ln -s SRR24994983_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994984_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994985_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994986_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994987_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
SRR24994988_GSM7506206_P3_plus_12_32F_Smith_C_Klf4_Homo_sapiens_Bisulfite-Seq_R1.fastq.gz
GSM7506206.fastq.gz

@edmundmiller
Copy link
Collaborator

using Fusion

I have a feeling this might be it, with the soft links for whatever weird reason.

Two new experiments:

  1. Can you run the methylseq test profile in the environment?
  2. Can you run the rnaseq test profile in the environment? (It has trimgalore)
  3. If the above two work, what about a rnaseq test full?

Also, any previous versions confirmed? Because the trimgalore module hasn't been updated in 11 months.

@FelixKrueger
Copy link
Contributor Author

It also fails with 2.6.0:

ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists
ln: GSM7506206.fastq.gz: File exists

and the process doesn't start at all with 2.5.0 (as it expected filenames to contain at least one _ underscore back then:

Execution completed unsuccessfully!

The full error message was:

fromIndex = -1

FelixKrueger added a commit that referenced this issue May 17, 2024
test(#392): Add boilerplate for tests
@sateeshperi sateeshperi linked a pull request May 17, 2024 that will close this issue
@sateeshperi sateeshperi added this to the 2.7.0 milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants