Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRISPResso Batch Output - Prime Editing Summary #89

Open
mathinic opened this issue Apr 9, 2021 · 2 comments
Open

CRISPResso Batch Output - Prime Editing Summary #89

mathinic opened this issue Apr 9, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@mathinic
Copy link

mathinic commented Apr 9, 2021

Hi,

I'm using CRISPResso2 with the CRISPRessoBatch function for prime editing but I'm missing some files which would be very useful to have:

1.) A table with all files from the batch, with each row representing one sample and the column representing the percentage of reads which are Reference UNMODIFIED, MODIFIED, Prime-Edited MODIFIED and AMBIGUOUS. So basically, the PieChart/Barplot (see below) summarized in a table for all samples. Right now I would need to go into every file and check the images, or calculate the percentages with a script from other files (e.g. from "CRISPResso_quantification_of_editing_frequency.txt").

I think such a summary table would save time for many of us.
image

2.) An image such as "11a.Prime_editing_nucleotide_percentage_quilt.pdf", but with an additional row of (whole sample), where we can visualize the editing percent in the sample before it is split up into "Reference" and "Prime edited". I can get such an image if I do not use the Prime editing function in CRISPResso, but then I lose the other functionality (such as visualization of spacer and extension in the image which is very useful). Right now, this image 11a is only of limited use for us, since the Reference and the Prime edited row usually don't show any percentages/variability for each base and it is not very appealing for presentations (see the demo image below).
image

3.) In addition it would be nice, to have this "original" nucleotide percentage image (before splitting up into Reference and Prime edited) stacked on top of each other in the BatchOutput. Right now I can only see such stacked images for the Reference (which again all look the same) and for "Scaffold" insertion (which is just a wild mess due to few reads, if insertion is low because no nicking guide was used).

Maybe I missed some files/functionalities which already exist and would fulfill my "wishes", if yes, would be nice to let me know. :)

Overall thanks once more for the great tool and keep up your great work!

Best,
Nicolas

@kclem
Copy link
Member

kclem commented Apr 11, 2021

HI @mathinic,

Thanks for your continued use and good ideas.

Have you checked out the file "CRISPRessoBatch_quantification_of_editing_frequency.txt" in the batch output folder? This should have editing rates for all amplicons. Every row is a unique amplicon in each sample, so a single sample would have a separate row for the unedited and prime-edited samples. Does this sound like it would be easier than opening up all the files? I tried to leave it in this format because the number of references (Even in prime-editing mode) could be more than one, so I didn't want to put each reference in a separate column (If I don't know how many there are). Does that make sense, and do you think you'd be able to use that file?

Yeah, I can probably make the plot you mention in 2, although 3 would probably be easiest just running the batch in non-base-editing mode. I'll look into this when I get a second. I'll leave this open and keep you updated.

@kclem kclem added the enhancement New feature or request label Apr 11, 2021
@mathinic
Copy link
Author

mathinic commented Apr 11, 2021

Hi @kclem

Indeed, the "CRISPRessoBatch_quantification_of_editing_frequency.txt" file actually contains all the information that is needed for calculating the the barplot in the first comment above. However, I think it would help to have one more column with editing percentage (similar to the following):
image
Most graphs in publications show the percentage of edited samples in a barplot, and having the percentage already in the table would remove the necessity of calculating it manually afterwards.

Update:
I wrote a short Python script which adds the values that I need for my own regular plotting of PE efficiency, and also creates a second file with Prime-edited samples only. (But it's of course less "elegant" than if a similar thing would be included in CRISPResso already 😉 )

import pandas as pd

batchdf = pd.read_csv('CRISPRessoBatch_on_PEfilelist\\CRISPRessoBatch_quantification_of_editing_frequency.txt',delimiter='\t')
batchdf['percent_of_total'] = batchdf.apply(lambda row: (row.Reads_aligned/row.Reads_aligned_all_amplicons)*100, axis=1)
batchdf['percent_unmodified_of_total'] = batchdf.apply(lambda row: (row.Unmodified/row.Reads_aligned_all_amplicons)*100, axis=1)
batchdf['percent_modified_of_total'] = batchdf.apply(lambda row: (row.Modified/row.Reads_aligned_all_amplicons)*100, axis=1)

# rearrange columns:
cols = ['Batch',
  'percent_of_total',
  'Amplicon',
  'Unmodified%',
  'Modified%',
  'Reads_in_input',
  'Reads_aligned_all_amplicons',
  'Reads_aligned',
  'Unmodified',
  'percent_unmodified_of_total',
  'Modified',
  'percent_modified_of_total',
  'Discarded',
  'Insertions',
  'Deletions',
  'Substitutions',
  'Only Insertions',
  'Only Deletions',
  'Only Substitutions',
  'Insertions and Deletions',
  'Insertions and Substitutions',
  'Deletions and Substitutions',
  'Insertions Deletions and Substitutions'
  ]
batchdf = batchdf[cols]

primeediteddf = batchdf[batchdf['Amplicon']=='Prime-edited']  # create filtered dataframe with prime-edited only

# write .csv files
batchdf.to_csv('CRISPRessoBatch_on_PEfilelist\\CRISPRessoBatch_quantification_of_editing_frequency_extended.csv')
primeediteddf.to_csv('CRISPRessoBatch_on_PEfilelist\\CRISPRessoBatch_quantification_of_editing_frequency_essentials.csv')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants