Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PureCN segmentation and VCF do not overlap #316

Open
pavamateo opened this issue Aug 22, 2023 · 1 comment
Open

PureCN segmentation and VCF do not overlap #316

pavamateo opened this issue Aug 22, 2023 · 1 comment

Comments

@pavamateo
Copy link

Describe the issue
When running PureCN with my segmentation and VCF files, I encounter an error that says "Segmentation and VCF do not overlap." This stops the execution of the program.

To Reproduce
I've tried doing everything from scratch and still getting the same error

starting by doing the IntervalFile

Rscript $PURECN/IntervalFile.R --in-file /Users/mateopava/Desktop/KAPA_HyperExome_primary_targets.bed \
--fasta /Users/mateopava/Desktop/hg38/hg38.fa --out-file $OUT_REF/baits_hg38_intervals.txt
--off-target --genome hg38
--export $OUT_REF/baits_optimized_hg38.bed
--mappability /Users/mateopava/Desktop/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set_100.bw \

then Coverage

Rscript $PURECN/Coverage.R --out-dir $OUT/PRO001 \
--bam /Users/mateopava/Desktop/PRO001_18-27110_DNAFFPE_NA_C_HN00131668.recal.bam
--intervals $OUT_REF/baits_hg38_intervals.txt

Rscript $PURECN/Coverage.R --out-dir $OUT/normals \
--bam normals.list
--intervals $OUT_REF/baits_hg38_intervals.txt
--cores 4

then NormalDB

Rscript $PURECN/NormalDB.R --out-dir $OUT_REF
--coverage-files example_normal_coverages.list
--genome hg38

then RunPureCN

Rscript $PURECN/PureCN.R --out $OUT/PRO001
--tumor $OUT/PRO001_18-27110_DNAFFPE_NA_C_HN00131668.recal_coverage_loess.txt.gz
--sampleid PRO001
--vcf /Users/mateopava/Desktop/PRO001_18-27110_DNAFFPE_NA_C_HN00131668_vs_PRO001_NA_DNASAL_NA_C_HN00131668.mutect2.filtered.vcf.gz
--normaldb $OUT_REF/normalDB_hg38.rds
--intervals $OUT_REF/baits_hg38_intervals.txt
--genome hg38


I've tried by using a thirdparty segmentation file (DNAcopy format)
Rscript $PURECN/PureCN.R --out $OUT/PRO001
--sampleid PRO001
--seg-file /Users/mateopava/Desktop/PRO001.seg
--vcf /Users/mateopava/Desktop/PRO001xxxmutect2.filtered.vcf.gz
--intervals $OUT_REF/baits_hg38_intervals.txt
--genome hg38

Expected behavior
I expected PureCN to process the files and provide an output without any errors.

Log file following the PureCN best practices

(base) mateopava@Mateos-MacBook-Pro Desktop % Rscript $PURECN/PureCN.R --out $OUT/PRO001
--tumor $OUT/PRO001_18-27110_DNAFFPE_NA_C_HN00131668.recal_coverage_loess.txt.gz
--sampleid PRO001
--vcf /Users/mateopava/Desktop/PRO001xxxmutect2.filtered.vcf.gz
--normaldb $OUT_REF/normalDB_hg38.rds
--intervals $OUT_REF/baits_hg38_intervals.txt
--genome hg38
[1] "/Users/mateopava/Desktop/PRO001"
INFO [2023-08-22 16:53:35] Loading PureCN 2.6.4...
INFO [2023-08-22 16:53:39] Mean coverages: chrX: 18.53, chrY: 20.37, chr1-22: 32.40.
INFO [2023-08-22 16:53:39] Sample sex: M
WARN [2023-08-22 16:53:39] Recommended to provide --fun-segmentation PSCBS.
INFO [2023-08-22 16:53:39] ------------------------------------------------------------
INFO [2023-08-22 16:53:39] PureCN 2.6.4
INFO [2023-08-22 16:53:39] ------------------------------------------------------------
INFO [2023-08-22 16:53:39] Arguments: -tumor.coverage.file /Users/mateopava/Desktop/PRO001_18-27110_DNAFFPE_NA_C_HN00131668.recal_coverage_loess.txt.gz -log.ratio -seg.file -vcf.file /Users/mateopava/Desktop/PRO001xxxmutect2.filtered.vcf.gz -genome hg38 -sex ? -args.setPriorVcf 6 -args.setMappingBiasVcf NULL -args.filterIntervals 100,0.05 -args.segmentation 0.005,NULL, -sampleid PRO001 -min.ploidy 1.4 -max.ploidy 6 -max.non.clonal 0.2 -max.homozygous.loss 0.05,1e+07 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -interval.file /Users/mateopava/Desktop/reference_files/baits_hg38_intervals.txt -min.logr.sdev 0.15 -max.segments 300 -plot.cnv TRUE -vcf.field.prefix PureCN. -cosmic.vcf.file -DB.info.flag DB -POPAF.info.field POP_AF -Cosmic.CNT.info.field Cosmic.CNT -model beta -post.optimize FALSE -BPPARAM -log.file /Users/mateopava/Desktop/PRO001/PRO001.log -normal.coverage.file -normalDB -args.filterVcf -fun.segmentation -test.num.copy -test.purity -speedup.heuristics
INFO [2023-08-22 16:53:39] Loading coverage files...
INFO [2023-08-22 16:53:41] Mean target coverages: 32X (tumor) 32X (normal).
INFO [2023-08-22 16:53:42] Mean coverages: chrX: 18.53, chrY: 20.37, chr1-22: 32.40.
INFO [2023-08-22 16:53:42] Mean coverages: chrX: 18.40, chrY: 17.87, chr1-22: 31.68.
INFO [2023-08-22 16:53:49] Removing 4993 intervals with missing log.ratio.
INFO [2023-08-22 16:53:49] Removing 11982 intervals excluded in normalDB.
INFO [2023-08-22 16:53:49] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2023-08-22 16:53:49] Removing 81022 low count (< 100 total reads) intervals.
INFO [2023-08-22 16:53:49] Removing 3262 low coverage (< 0.0015X) intervals.
INFO [2023-08-22 16:53:49] Using 151376 intervals (130716 on-target, 20660 off-target).
INFO [2023-08-22 16:53:49] Ratio of mean on-target vs. off-target read counts: 0.10
INFO [2023-08-22 16:53:49] Mean off-target bin size: 97601
INFO [2023-08-22 16:53:49] AT/GC dropout: 0.97 (tumor), 0.98 (normal), 1.01 (coverage log-ratio).
INFO [2023-08-22 16:53:49] Loading VCF...
INFO [2023-08-22 16:53:51] Found 71062 variants in VCF file.
INFO [2023-08-22 16:53:51] Removing 560 triallelic sites.
WARN [2023-08-22 16:53:51] Found GERMQ info field with Phred scaled germline probabilities.
WARN [2023-08-22 16:53:51] vcf.file has no DB info field for membership in germline databases. Found and used somatic status instead.
INFO [2023-08-22 16:53:51] 520 (0.7%) variants annotated as likely germline (DB INFO flag).
INFO [2023-08-22 16:53:52] 1_PRO001_18-27110_DNAFFPE_NA_C_HN00131668 is tumor in VCF file.
INFO [2023-08-22 16:53:53] No homozygous variants in VCF, provide unfiltered VCF.
INFO [2023-08-22 16:53:53] Detected MuTect2 VCF.
INFO [2023-08-22 16:53:54] Removing 59300 Mutect2 calls due to blacklisted failure reasons.
INFO [2023-08-22 16:53:54] Removing 45 non heterozygous (in matched normal) germline SNPs.
INFO [2023-08-22 16:53:54] Removing 5210 low quality variants with non-offset BQ < 25.
INFO [2023-08-22 16:53:54] Base quality scores range from 24 to 38 (offset by 1)
INFO [2023-08-22 16:53:54] Minimum number of supporting reads ranges from 2 to 9, depending on coverage and BQS.
INFO [2023-08-22 16:53:55] Removing 1481 variants with AF < 0.030 or AF >= 1.000 or insufficient supporting reads or depth < 15.
INFO [2023-08-22 16:53:55] Total size of targeted genomic region: 25.30Mb (36.42Mb with 50bp padding).
INFO [2023-08-22 16:53:55] 1.6% of targets contain variants.
INFO [2023-08-22 16:53:55] Removing 2125 variants outside intervals.
WARN [2023-08-22 16:53:55] Less than half of variants in dbSNP. Make sure that VCF contains both germline and somatic variants.
INFO [2023-08-22 16:53:55] Found SOMATIC annotation in VCF.
INFO [2023-08-22 16:53:55] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
WARN [2023-08-22 16:53:55] Calculated mapping bias from somatic SNVs is not a number. Setting it to 0.49 but there is likely an issue with your input VCF.
INFO [2023-08-22 16:53:55] Found SOMATIC annotation in VCF. Setting mapping bias to 0.490.
INFO [2023-08-22 16:53:55] Excluding 2341 novel or poor quality variants from segmentation.
INFO [2023-08-22 16:53:55] Sample sex: M
INFO [2023-08-22 16:53:55] Segmenting data...
INFO [2023-08-22 16:53:55] Interval weights found, will use weighted CBS.
INFO [2023-08-22 16:53:55] Loading pre-computed boundaries for DNAcopy...
INFO [2023-08-22 16:53:55] Setting undo.SD parameter to 1.000000.
Setting multi-figure configuration
FATAL [2023-08-22 16:53:59] Segmentation and VCF do not overlap.

FATAL [2023-08-22 16:53:59]

FATAL [2023-08-22 16:53:59] This is most likely a user error due to invalid input data or

FATAL [2023-08-22 16:53:59] parameters (PureCN 2.6.4).

Error: Segmentation and VCF do not overlap.

This is most likely a user error due to invalid input data or
parameters (PureCN 2.6.4).
Además: Warning message:
In .bcfHeaderAsSimpleList(header) :
duplicate keys in header will be forced to unique rownames
Ejecución interrumpida

Log file for thirdparty segmentation

Rscript $PURECN/PureCN.R --out $OUT/PRO001
--sampleid PRO001
--seg-file /Users/mateopava/Desktop/PRO001.seg
--vcf /Users/mateopava/Desktop/PRO001xxxmutect2.filtered.vcf.gz
--intervals $OUT_REF/baits_hg38_intervals.txt
--genome hg38
[1] "/Users/mateopava/Desktop/PRO001"
INFO [2023-08-22 17:15:14] Loading PureCN 2.6.4...
WARN [2023-08-22 17:15:14] Recommended to provide --fun-segmentation Hclust.
INFO [2023-08-22 17:15:14] ------------------------------------------------------------
INFO [2023-08-22 17:15:14] PureCN 2.6.4
INFO [2023-08-22 17:15:14] ------------------------------------------------------------
INFO [2023-08-22 17:15:14] Arguments: -normal.coverage.file -tumor.coverage.file -log.ratio -seg.file /Users/mateopava/Desktop/PRO001.seg -vcf.file /Users/mateopava/Desktop/PRO001xxxmutect2.filtered.vcf.gz -normalDB -genome hg38 -sex ? -args.setPriorVcf 6 -args.setMappingBiasVcf NULL -args.filterIntervals 100,0.05 -args.segmentation 0.005,NULL, -sampleid PRO001 -min.ploidy 1.4 -max.ploidy 6 -max.non.clonal 0.2 -max.homozygous.loss 0.05,1e+07 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -interval.file /Users/mateopava/Desktop/reference_files/baits_hg38_intervals.txt -min.logr.sdev 0.15 -max.segments 300 -plot.cnv TRUE -vcf.field.prefix PureCN. -cosmic.vcf.file -DB.info.flag DB -POPAF.info.field POP_AF -Cosmic.CNT.info.field Cosmic.CNT -model beta -post.optimize FALSE -BPPARAM -log.file /Users/mateopava/Desktop/PRO001/PRO001.log -args.filterVcf -fun.segmentation -test.num.copy -test.purity -speedup.heuristics
INFO [2023-08-22 17:15:14] Loading coverage files...
WARN [2023-08-22 17:15:17] Expecting numeric chromosome names in seg.file, assuming file is properly sorted.
INFO [2023-08-22 17:15:18] Mean coverages: chrX: 16.00, chrY: 16.00, chr1-22: 16.00.
INFO [2023-08-22 17:15:18] Mean coverages: chrX: 16.00, chrY: 16.00, chr1-22: 16.00.
INFO [2023-08-22 17:15:25] Removing 2351 intervals with missing log.ratio.
INFO [2023-08-22 17:15:25] Using 250284 intervals (227382 on-target, 22902 off-target).
INFO [2023-08-22 17:15:25] Ratio of mean on-target vs. off-target read counts: NaN
INFO [2023-08-22 17:15:25] Mean off-target bin size: 90427
INFO [2023-08-22 17:15:25] Loading VCF...
INFO [2023-08-22 17:15:26] Found 71062 variants in VCF file.
INFO [2023-08-22 17:15:27] Removing 560 triallelic sites.
WARN [2023-08-22 17:15:27] Found GERMQ info field with Phred scaled germline probabilities.
WARN [2023-08-22 17:15:28] vcf.file has no DB info field for membership in germline databases. Found and used somatic status instead.
INFO [2023-08-22 17:15:28] 520 (0.7%) variants annotated as likely germline (DB INFO flag).
INFO [2023-08-22 17:15:29] 1_PRO001_18-27110_DNAFFPE_NA_C_HN00131668 is tumor in VCF file.
INFO [2023-08-22 17:15:29] No homozygous variants in VCF, provide unfiltered VCF.
INFO [2023-08-22 17:15:29] Detected MuTect2 VCF.
INFO [2023-08-22 17:15:29] Removing 59300 Mutect2 calls due to blacklisted failure reasons.
INFO [2023-08-22 17:15:29] Removing 45 non heterozygous (in matched normal) germline SNPs.
INFO [2023-08-22 17:15:29] Removing 5210 low quality variants with non-offset BQ < 25.
INFO [2023-08-22 17:15:29] Base quality scores range from 24 to 38 (offset by 1)
INFO [2023-08-22 17:15:29] Minimum number of supporting reads ranges from 2 to 9, depending on coverage and BQS.
INFO [2023-08-22 17:15:30] Removing 1481 variants with AF < 0.030 or AF >= 1.000 or insufficient supporting reads or depth < 15.
INFO [2023-08-22 17:15:30] Total size of targeted genomic region: 38.30Mb (58.38Mb with 50bp padding).
INFO [2023-08-22 17:15:30] 1.3% of targets contain variants.
INFO [2023-08-22 17:15:30] Removing 1274 variants outside intervals.
WARN [2023-08-22 17:15:30] Less than half of variants in dbSNP. Make sure that VCF contains both germline and somatic variants.
INFO [2023-08-22 17:15:30] Found SOMATIC annotation in VCF.
INFO [2023-08-22 17:15:30] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
WARN [2023-08-22 17:15:30] Calculated mapping bias from somatic SNVs is not a number. Setting it to 0.49 but there is likely an issue with your input VCF.
INFO [2023-08-22 17:15:30] Found SOMATIC annotation in VCF. Setting mapping bias to 0.490.
INFO [2023-08-22 17:15:30] Excluding 3192 novel or poor quality variants from segmentation.
INFO [2023-08-22 17:15:30] Sample sex: M
INFO [2023-08-22 17:15:30] Segmenting data...
INFO [2023-08-22 17:15:30] Loaded provided segmentation file PRO001.seg (format DNAcopy).
WARN [2023-08-22 17:15:30] Expecting numeric chromosome names in seg.file, assuming file is properly sorted.
INFO [2023-08-22 17:15:30] Re-centering provided segment means (offset -2.8481).
INFO [2023-08-22 17:15:30] Loading pre-computed boundaries for DNAcopy...
INFO [2023-08-22 17:15:30] Setting undo.SD parameter to 0.000000.
Setting multi-figure configuration
FATAL [2023-08-22 17:15:38] Segmentation and VCF do not overlap.

FATAL [2023-08-22 17:15:38]

FATAL [2023-08-22 17:15:38] This is most likely a user error due to invalid input data or

FATAL [2023-08-22 17:15:38] parameters (PureCN 2.6.4).

Error: Segmentation and VCF do not overlap.

This is most likely a user error due to invalid input data or
parameters (PureCN 2.6.4).
Además: Warning message:
In .bcfHeaderAsSimpleList(header) :
duplicate keys in header will be forced to unique rownames
Ejecución interrumpida

@lima1
Copy link
Owner

lima1 commented Sep 6, 2023

Apologies, was out of office and will get back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants