grep exercise unrealistic #230

ErinBecker · 2019-06-01T17:26:57Z

Arizona Bug BBQ - In general we dislike the current set of exercises using grep. it is quite artificial and not relevant to the pipeline that we are working through with them. We suggest dropping grep and piping entirely from this lesson unless someone comes up with an exercise that is relevant to the current data set and is something learners would use in their actual workflow.

Additionally, most bioinformatic tools don't take advantage of piping.

aschuerch · 2019-06-24T07:31:10Z

I agree, it is not directly relevant to a full workshop and the workshop would profit from trimming down the material. However, whenever I teach this lesson as "stand alone", I never skip this because the output of many bioinformatic tools I use need to be redirected to a file. I would suggest we make this an optional episode under 'Extras'.
What do others think?

esebesty · 2019-08-10T11:51:55Z

I was trying to come up with a useful exercise with fastq files and grep, but yeah, the lesson is kind of artificial. If the lesson was done on a set of fasta files (transcripts, etc) it would be easier to come up with relevant examples for grep, piping and other things, but that would mean too much work I guess.

Still, grep and piping is very useful in downstream processing of results and I also think it would be good to have these exercises in the 'Extra' episode.

jsgro · 2021-08-31T03:23:58Z

Learning about grep and redirect is useful in many cases.
In order to "mimic" an AWS instance for a local (laptop) teaching I first used Ubuntu (20.04 LTS) within docker to follow the lessons, as Ubuntu is what is shown from an AWS "splash" screen of the introduction lesson 01. I thought that there was an error in the grep exercises of Lesson 4 |Redirection because I was getting a count of 537 "bad" reads of 10- Ns, rather than 802 as in the lesson.

grep -B1 -A2 NNNNNNNNNN SRR098026.fastq | wc
    537    1073   23217

However, if I used the same command on my macOS, the I would get 802 as it is written in the lesson. I then tried Docker instances of Alpine and Centos 7 and these also resulted in 537. The difference is that on the Linux distro it is gnu grep while on the Mac it is BSD grep.
After some search I figured that the difference is about non-matching lines written as a -- output line. The Linux gnu grep only write only one towards the end, while the BSD Mac version writes 266 of them:

# On macOS: 
grep -B1 -A2 NNNNNNNNNN SRR098026.fastq | wc -l
     802 
grep -B1 -A2 NNNNNNNNNN SRR098026.fastq | egrep '^--' | wc -l
     266

I am not sure if/how, this is a bug, but there is definitely a problem and inconsistency. I don't understand while the gnu grep would provide only one. I also checked the "end-of-line" to make sure that the file had a Unix format.
Was the original course developed on a Linux distro or a BSD-derived system?

careykm · 2022-12-15T15:46:21Z

I was trying to come up with a useful exercise with fastq files and grep, but yeah, the lesson is kind of artificial. If the lesson was done on a set of fasta files (transcripts, etc) it would be easier to come up with relevant examples for grep, piping and other things, but that would mean too much work I guess.

Still, grep and piping is very useful in downstream processing of results and I also think it would be good to have these exercises in the 'Extra' episode.

I agree, grep is a useful tool, I have some suggestions on a lesson that is relevant that I am currently using in my dissertation using fastq files. I originally had BAM, and I stripped the bam files of the reference genome, I had to separate paired end fastq reads to re-align to a new reference genome. The 'for loop' code I used in my MAC terminal to separate the files are :

first separate pair-end reads between 1 and 2.

for f in *.fastq do cat ${f} | grep '^@.*/1$' -A 3 --no-group-separator > PreAligned_Fastq/${f}_R1.fastq

cat ${f} | grep '^@.*/2$' -A 3 --no-group-separator > PreAligned_Fastq/${f}_R2.fastq done

bkmgit · 2023-07-25T12:59:10Z

Good comment on the type of grep command used. Lesson should be updated.

sstevens2 mentioned this issue Jun 1, 2019

Possible fix to issue #224 #232

Closed

aschuerch mentioned this issue Jun 24, 2019

Introduce -v flag of grep #227

Closed

rortizmerino mentioned this issue Sep 17, 2019

Lesson contribution grep #258

Merged

akshayparopkari added status:refer to cac Curriculum Advisory Committee input needed type:enhancement Propose enhancement to the lesson labels May 14, 2020

vhmcck added the type:clarification Suggest change for make lesson clearer label Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grep exercise unrealistic #230

grep exercise unrealistic #230

ErinBecker commented Jun 1, 2019

aschuerch commented Jun 24, 2019

esebesty commented Aug 10, 2019

jsgro commented Aug 31, 2021 •

edited

careykm commented Dec 15, 2022 •

edited

bkmgit commented Jul 25, 2023

grep exercise unrealistic #230

grep exercise unrealistic #230

Comments

ErinBecker commented Jun 1, 2019

aschuerch commented Jun 24, 2019

esebesty commented Aug 10, 2019

jsgro commented Aug 31, 2021 • edited

careykm commented Dec 15, 2022 • edited

first separate pair-end reads between 1 and 2.

bkmgit commented Jul 25, 2023

jsgro commented Aug 31, 2021 •

edited

careykm commented Dec 15, 2022 •

edited