Skip to content

Easy filtering of Aneufinder model files based on QC measurements

License

Notifications You must be signed in to change notification settings

TWvR/AneufinderFileFilter

Repository files navigation

AneufinderFileFilter

AneufinderFileFilter contains R scripts that allows for easy filtering of single-cell DNA sequencing output generated by the R package "Aneufinder". Filtering is based on QC data generated by Aneufinder and filtered model files can be reorganized to new directories. This script only reads the model files and doesn't make any changes to the orginal Aneufinder output and model files.

Scripts

To use the code you need two included scripts:

  1. RUN_AneufinderFileFilter script
    Set input/output folders and choose correct settings

  2. FUNC_AneufinderFileFilter script
    Contains the code that is used to filter the data

Required R packages

This script makes use of the following R packages:

  1. Aneufinder
  2. Colorspace

Input / Output

Input:

  • Aneufinder Model files

Output:

For each sample:

  • .txt summary of used filter parameters
  • directory with selected model files
  • directory with excluded model files
  • directory with perfect diploid model files
  • .pdf genomewide karyotype plot based on selected/excluded files
  • .pdf single karyotype plots based on selected/excluded files
  • .pdf heterogeneity/aneuploidy plot based on selected/excluded files
  • .csv with QC measurements for each file
  • .csv with karyotype measurements for each chromosome
  • .csv with karyotype measurements for whole genome

many of the above are optional

Instructions for use

To get started I would advise users to make use of R studio and create a new project in R and name it 'AneufinderFileFilter'. Then download both scripts from GitHub and place these within the project folder of your new AneufinderFileFilter project.

In general there's no need to open and/or adjust the function script, this is only needed if you like to make adjustments to the code that performs the actual filtering or the code by which the different plots are generated. You only need to make sure that the RUN script contains the correct source-path to the FUNC script.

The 'Run_AneufinderFileFilter'-script is subdivided in multiple sections to create a good overview of the different settings. Prior to each run you probably like to give your project a new name, assign the correct input folder and check the filtering and plotting settings.

After making all required adjustments, run the code line-by-line. The actual filtering is commenced at the end of the run script by running AneufinderFileFilter(sampleIDs). Soon thereafter you will be prompted to quickly check filter settings; if correct, please enter 'Y' to continue the script.

Options

Available filtering options

  1. Filter Aneufinder model files generated via edivisive, dnaCopy or hmm.
  2. Filter files based on total read count per cell, number of chromosome segements, spikiness and/or bhattacharyya distance.
  3. Exclude model files with too high weighted average copy number
  4. Exclude model files with a perfect diploid genome

Obtain selected Aneufinder model files

  1. Copy selected model files to new folder
  2. Copy model files from perfect diploid cells to new folder

Plots

  1. PDF with summary statistics for included and excluded files
  2. PDF with genomewide profile for selected files
  3. PDF with single cell karyotype profiles for included or excluded files
  4. PDF with heterogeneity profiles for selected model files
  5. CSV file with measurement statistics for each model file

Final comments

This script was one of my first builds, hence the coding could probably have been more efficient. Nevertheless I hope it can be used to your benefit. If you have any questions or need help with running the script, please don't hesitate to send me a message.

Thomas van Ravesteyn

About

Easy filtering of Aneufinder model files based on QC measurements

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages