Support for homologous recombination deficiency scores in Dx.R #159

lima1 · 2020-12-19T18:03:53Z

Probably wrapper around https://github.com/sztup/scarHRD

lbeltrame · 2021-01-14T13:36:00Z

I actually "reverse-engineered" the code (because it is really hard to read and understand what does it do) when implementing it internally, using PureCN data as a source to calculate HRD scores. If you're interested I might write down the notes (I'd have put up a PR, but I rewrote the implementation in Python as I was more familiar with it).

lima1 · 2021-01-14T23:55:05Z

Hi @lbeltrame, that would be awesome. More than happy to add it as Python code for now.

lbeltrame · 2021-01-19T10:52:57Z

Here are some notes on how scarHRD and how I adapted it:

Preprocessing (before the HRD calculation)

Get the variants off PureCN and remove flagged variants (FLAGGED == TRUE)
Construct the "major allele CN" by subtracting ML.M.SEGMENT to ML.C. (ML.CN.MAJOR = ML.C - ML.M.SEGMENT)

Segment generation

Contrary to "popular" belief, this does not use the segmentation output. It generates "segments" of identical major and minor CN. It works like this:

Group data by chromosome
Group consecutive stretches of (major, minor) CN together
Aggregate the stretches in a single segment with the same sample ID, the same chromosome, the first occurence of start coordinate, the last occurrence of end coordinate, and the median of the log.ratio
Calculate the length of the segment (will be used later)

LOH score

Start with a LOH score of 0

Group segments (those generated before) by chromosome (but do only autosomes)
Exclude data from chromosomes where minor CN is consistently 0 (LOH of the whole chromosome)
Set all segments with major allele CN > 1 as equal 1 (they count all the same)
Re-aggregate the segments (see point 3 of Segment generation)
Select those segments where major CN is 1 and minor CN is 0
Select those segments where the length of the segments is at least 15Mbp
Sum the number of segments wihch are left after the selection in 6. in that chromosome to the LOH score

LST score (large scale transitions)

This was pretty hard to figure out.

Start with LST score of 0.

Group segments (those generated before) by chromosome (but do only autosomes)
Subset the data for those segments that wholly fall into p or q arm (you'll need centromere locations for this) and not those across
Handle p and q data separately (because the following steps change start or end depending on the arm)
Re-aggregate the segments (see point 3 of Segment generation) for each arm
Set the last (p arm) or the first (q arm) start position as the corresponding centromere coordinate (start and end, respectively)
Recompute segment lengths
Identify all segments that are below 3 Mbp
Iterate through these segments
1. Remove the segment from the data (and from the list at point 7)
2. Re-aggregate the segments (see point 3 of Segment generation)
3. Recompute segment lengths
4. Repeat from step 1 until all the segments < 3mbp have been removed
Select all pairs of adjacent segments that are >= 10 Mbp in length and spaced less than 3 Mbp
Count these segments and add the value to the LST score

TAI (Telomeric allelic imbalance)

This is another confusing one, and I'm not sure I understood the logic completely (but my implementation produces identical results to scarHRD).

Start with a TAI score of 0.

Group segments (those generated before) by chromosome (but do only autosomes; the following operations are applied to each group)
Discard all segments smaller than 1Mbp (to reduce noise)
Re-aggregate the segments (see point 3 of Segment generation)
If there is only one segment covering the entire chromosome, pass to the next chromosome
Get the absolute minimum major copy number observed in the segments for that chromosome
For each segment, consider the major and minor CN:
1. If the minimum major CN is equal to 1 or the minimum major CN has an even number of copies, check if there is a difference between major and minor CN: if there is, mark the segment has having "Interstitial" allelic imbalance
2. If the condition at point 1 is not satisfied, we have an odd number of copies of major allele CN, consider the contribution of the minor allele to the imbalance: if the minor allele CN in that segment is not 0 and the sum of major and minor allele CN for that segment is equal to the minimum copy number observed, mark the segment as having "Interstitial" allelic imbalance
3. If neither 1 or 2 holds, mark the segment as having no imbalance
Get the tagged segment from point 6 closest to the 5' telomere and the segment closest to the 3' telomere
1. If the segment closest to the 5' telomere ends before the centromere, mark the segment as having "telomeric allelic imbalance" and add 1 to the TAI score
2. If the segment closest to the 3' telomere starts before the centromere, mark the segment as having "telomeric allelic imbalance" and add 1 to the TAI score

Wrapping up

Collect LST, TAI and LOH scores
Calculate HRD score as LST+TAI+LOH (this is the latest version described in the papers)

Currently the script I use depends on pandas (which pulls in numpy as well) and requires a file with centromere coordinates. What would be the minimum Python version you consider acceptable (consider it will likely only work with Python 3.4 or later)?

lima1 · 2021-02-10T16:54:07Z

Thanks so much Luca, that's super helpful. And sorry for the delayed response. I think for now I'll take whatever you have. At some point when I need it I might reimplement in R to make the installation easier, but as a prototype that's awesome for now.

lbeltrame · 2021-02-10T21:18:45Z

Super! I'll clean it up tomorrow and attach it here. Do you have a data file with centromere information already inside PureCN? That's needed for the TAI calculation (you need to know you're not crossing the centromere).

lima1 · 2021-02-18T00:23:43Z

Yes, I have the centromeres. Currently as serialized RDA file in https://github.com/lima1/PureCN/tree/master/data though.

lbeltrame · 2021-02-18T05:55:42Z

I'll see whether https://github.com/ofajardo/pyreadr can be used to handle this (I wanted to get rid of a dependency on a random file that might or might not be in some person's disk anyway).

ShvartsmanIrina · 2024-02-29T09:40:39Z

Hi @lbeltrame , do you have a script for HRD scoring? I would be very grateful for help!

lbeltrame · 2024-02-29T10:44:14Z

No, not at the moment, unfortunately (I switched institutions in the meantime).

ShvartsmanIrina · 2024-02-29T14:29:49Z

@lbeltrame, thank you very much for your answer! I understand that you are a professional in this field, I come from the field of microbiome. Maybe, as an experienced professional, you can advise me on a tool or workflow for calculating HRD score from the results of unpaired tumor samples?

lima1 added the enhancement label Dec 19, 2020

lima1 mentioned this issue Mar 4, 2024

HRD score from PureCN outputs #352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for homologous recombination deficiency scores in Dx.R #159

Support for homologous recombination deficiency scores in Dx.R #159

lima1 commented Dec 19, 2020

lbeltrame commented Jan 14, 2021 •

edited

lima1 commented Jan 14, 2021

lbeltrame commented Jan 19, 2021 •

edited

lima1 commented Feb 10, 2021

lbeltrame commented Feb 10, 2021

lima1 commented Feb 18, 2021

lbeltrame commented Feb 18, 2021

ShvartsmanIrina commented Feb 29, 2024

lbeltrame commented Feb 29, 2024

ShvartsmanIrina commented Feb 29, 2024

Support for homologous recombination deficiency scores in Dx.R #159

Support for homologous recombination deficiency scores in Dx.R #159

Comments

lima1 commented Dec 19, 2020

lbeltrame commented Jan 14, 2021 • edited

lima1 commented Jan 14, 2021

lbeltrame commented Jan 19, 2021 • edited

Preprocessing (before the HRD calculation)

Segment generation

LOH score

LST score (large scale transitions)

TAI (Telomeric allelic imbalance)

Wrapping up

lima1 commented Feb 10, 2021

lbeltrame commented Feb 10, 2021

lima1 commented Feb 18, 2021

lbeltrame commented Feb 18, 2021

ShvartsmanIrina commented Feb 29, 2024

lbeltrame commented Feb 29, 2024

ShvartsmanIrina commented Feb 29, 2024

lbeltrame commented Jan 14, 2021 •

edited

lbeltrame commented Jan 19, 2021 •

edited