Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in evaluating the argument 'subject' in selecting a method for function 'findOverlaps': 'seqnames' cannot contain NAs #165

Open
yulijia opened this issue Feb 17, 2024 · 5 comments

Comments

@yulijia
Copy link

yulijia commented Feb 17, 2024

Hi,

I ran Numbat with WGS CNV as input; however, it always returns the error message 'seqnames' cannot contain NAs.

I don't have any NAs in count_mat_ATC2, df_allele_ATC2, or the segs_consensus file.

Can somebody help me fix this problem?

  out = run_numbat(
    count_mat_ATC2 # gene x cell integer UMI count matrix
    ref_hca, # reference expression profile, a gene x cell type normalized expression level matrix
    df_allele_ATC2, # allele dataframe generated by pileup_and_phase script
    genome = "hg38",
    t = 1e-5,
    ncores = 24,
    plot = TRUE,
    segs_consensus_fix=segs_consensus,
    out_dir = paste0('../out/numbat_with_segs/',scid)
  )
Numbat version: 1.3.3
Scistreer version: 1.2.0
Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 20
min_cells = 50
init_k = 3
max_cost = 1714.8
n_cut = 0
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
segs_loh = None
call_clonal_loh = FALSE
segs_consensus_fix = Given
multi_allelic = TRUE
min_LLR = 5
min_overlap = 0.45
max_entropy = 0.5
skip_nj = FALSE
diploid_chroms = None
ncores = 24
ncores_nni = 24
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = TRUE
genome = hg38
Input metrics:
5716 cells
Mem used: 53.7Gb
Approximating initial clusters using smoothed expression ..
Mem used: 53.7Gb
number of genes left: 9575
running hclust...
! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.                                                                         
! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
Iteration 1
Mem used: 53.7Gb
Expression noise level (MSE): high (2.7). Consider using a custom expression reference profile.
Using fixed consensus CNVs
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'subject' in selecting a method for function 'findOverlaps': 'seqnames' cannot contain NAs

Thanks in advance.

Lijia

@teng-gao
Copy link
Collaborator

Can you print out segs_consensus?

@yulijia
Copy link
Author

yulijia commented Feb 20, 2024

Sure, the data frame is

CHROM   seg seg_start       seg_end cnv_state
chr1    seg1    1       820000  bdel
chr1    seg2    820001  1067000 neu
chr1    seg3    1067001 1688000 loh
chr1    seg4    1688001 1695000 bdel
chr1    seg5    1695001 1721000 bdel
chr1    seg6    1721001 1733000 bdel
chr1    seg7    1733001 1936000 loh
chr1    seg8    1936001 4063000 loh
chr1    seg9    4063001 4065000 bdel
chr1    seg10   4065001 4068000 del
chr1    seg11   4068001 6561000 loh
chr1    seg12   6561001 6567000 neu
chr1    seg13   6567001 7941000 loh
chr1    seg14   7941001 8947000 loh
chr1    seg15   8947001 9536000 loh
chr1    seg16   9536001 9537000 bdel
chr1    seg17   9537001 10894000        loh
chr1    seg18   10894001        10895000        bdel
chr1    seg19   10895001        10896000        bdel
chr1    seg20   10896001        15799000        loh
chr1    seg21   15799001        15953000        neu
chr1    seg22   15953001        16506000        loh
chr1    seg23   16506001        16728000        amp
chr1    seg24   16728001        17158000        loh

@teng-gao
Copy link
Collaborator

Hi, do you only expect aberration on chr1? Why not include all chromosomes?

@yulijia
Copy link
Author

yulijia commented Feb 22, 2024

Hi @teng-gao ,

This is only a small portion of my entire file. The dataset is in-house patient data, which I cannot expose to the public. Do you need the entire file? If yes, please let me know, and I will attempt to obtain permission to share it with you.

@quail768
Copy link

Hey @yulijia as per the documentation of Numbat found here:
Using existing CNV calls
Sometimes users already have CNV calls from bulk WGS, WES, or array analysis. In this case, you can supply the existing CNV profile via segs_consensus_fix parameter to fix the CNV boundaries and states. To do so, you may provide a dataframe with the following columns:

CHROM: integer; chromosome (1-22)
seg: character; segment ID (e.g. 1a, 1b, 2a, 2b, etc.)
seg_start: integer; segment start position
seg_end: integer; segment end position
cnv_state: character; copy number state (neu, del, amp, loh, bamp, bdel)
Please note that diploid segments (cnv_state = "neu") should also be included (i.e. segs_consensus_fix should be a complete copy number profile including all chromosomes).

CHROM needs to be an integer. You have a character in that col. Maybe try changing that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants