Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: math domain error in PC with missing data #138

Open
priamai opened this issue Oct 8, 2023 · 3 comments
Open

ValueError: math domain error in PC with missing data #138

priamai opened this issue Oct 8, 2023 · 3 comments

Comments

@priamai
Copy link

priamai commented Oct 8, 2023

Hi there,
my input data is like this:

image

I then want to discover with missing values:

from causallearn.search.ConstraintBased.PC import pc
dataset= X.to_numpy()
sub_cols = X.columns
# default parameters
cg = pc(dataset,alpha=0.05,indep_test='mv_fisherz',mvpc=True)

Full error:


ValueError                                Traceback (most recent call last)
Cell In[206], line 5
      3 sub_cols = X.columns
      4 # default parameters
----> 5 cg = pc(dataset,alpha=0.05,indep_test='mv_fisherz',mvpc=True)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:41, in pc(data, alpha, indep_test, stable, uc_rule, uc_priority, mvpc, correction_name, background_knowledge, verbose, show_progress, node_names, **kwargs)
     39     if indep_test == fisherz:
     40         indep_test = mv_fisherz
---> 41     return mvpc_alg(data=data, node_names=node_names, alpha=alpha, indep_test=indep_test, correction_name=correction_name, stable=stable,
     42                     uc_rule=uc_rule, uc_priority=uc_priority, background_knowledge=background_knowledge,
     43                     verbose=verbose,
     44                     show_progress=show_progress, **kwargs)
     45 else:
     46     return pc_alg(data=data, node_names=node_names, alpha=alpha, indep_test=indep_test, stable=stable, uc_rule=uc_rule,
     47                   uc_priority=uc_priority, background_knowledge=background_knowledge, verbose=verbose,
     48                   show_progress=show_progress, **kwargs)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:200, in mvpc_alg(data, node_names, alpha, indep_test, correction_name, stable, uc_rule, uc_priority, background_knowledge, verbose, show_progress, **kwargs)
    198 indep_test = CIT(data, indep_test, **kwargs)
    199 ## Step 1: detect the direct causes of missingness indicators
--> 200 prt_m = get_parent_missingness_pairs(data, alpha, indep_test, stable)
    201 # print('Finish detecting the parents of missingness indicators.  ')
    202 
    203 ## Step 2:
    204 ## a) Run PC algorithm with the 1st step skeleton;
    205 cg_pre = SkeletonDiscovery.skeleton_discovery(data, alpha, indep_test, stable,
    206                                               background_knowledge=background_knowledge,
    207                                               verbose=verbose, show_progress=show_progress, node_names=node_names)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:275, in get_parent_missingness_pairs(data, alpha, indep_test, stable)
    272 ## Get the index of parents of missingness indicators
    273 # If the missingness indicator has no parent, then it will not be collected in prt_m
    274 for missingness_i in missingness_index:
--> 275     parent_of_missingness_i = detect_parent(missingness_i, data, alpha, indep_test, stable)
    276     if not isempty(parent_of_missingness_i):
    277         parent_missingness_pairs['prt'].append(parent_of_missingness_i)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:363, in detect_parent(r, data_, alpha, indep_test, stable)
    361 if len(Neigh_x) >= depth:
    362     for S in combinations(Neigh_x, depth):
--> 363         p = cg.ci_test(x, y, S)
    364         if p > alpha:
    365             if not stable:  # Unstable: Remove x---y right away

File /opt/conda/lib/python3.10/site-packages/causallearn/graph/GraphClass.py:58, in CausalGraph.ci_test(self, i, j, S)
     56 # assert i != j and not i in S and not j in S
     57 if self.test.method == 'mc_fisherz': return self.test(i, j, S, self.nx_skel, self.prt_m)
---> 58 return self.test(i, j, S)

File /opt/conda/lib/python3.10/site-packages/causallearn/utils/cit.py:388, in MV_FisherZ.__call__(self, X, Y, condition_set)
    386 if abs(r) >= 1: r = (1. - np.finfo(float).eps) * np.sign(r) # may happen when samplesize is very small or relation is deterministic
    387 Z = 0.5 * log((1 + r) / (1 - r))
--> 388 X = sqrt(len(test_wise_deletion_XYcond_rows_index) - len(condition_set) - 3) * abs(Z)
    389 p = 2 * (1 - norm.cdf(abs(X)))
    390 self.pvalue_cache[cache_key] = p

ValueError: math domain error
@kunwuz
Copy link
Collaborator

kunwuz commented Oct 12, 2023

Hi, it seems that #119 and #29 are related to this issue. Could you please try to add some random noises and see if it remains? I conjecture that it might be a violation of some assumptions in the data, such as singularity somewhere.

@priamai
Copy link
Author

priamai commented Oct 13, 2023

Hi there, sounds like it but why is not generating the singularity Exception as it was discussed in the thread.
Maybe it has not been implemented even though the issue was closed suggesting it will produce a meaningful error?

@kunwuz
Copy link
Collaborator

kunwuz commented Oct 13, 2023

We had updated the code but perhaps your case was not covered (#58). Would you mind providing us (perhaps via email: yujiazh@cmu.edu) with a minimal reproducing example for your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants