Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data correlation matrix is singular #155

Open
asha24choudhary opened this issue Nov 24, 2023 · 5 comments
Open

Data correlation matrix is singular #155

asha24choudhary opened this issue Nov 24, 2023 · 5 comments

Comments

@asha24choudhary
Copy link

So I might be missing some theoretical concept, but want to clear it now.

I have a dataset, take the fork scenario.
My data is generated as follows:

#c) Fork

# Create the graph describing the causal structure
graph = """graph[directed 1 node[id "W" label "W"]
                    node[id "X" label "X"]
                    node[id "Y" label "Y"]
                    edge[source "X" target "Y"]
                    edge[source "X" target "W"]]""".replace('\n', '')


# # Generate the data
X = np.random.randn(N_SAMPLES)
W = 0.5*X
Y = 0.8*X  

# Data to df
df = pd.DataFrame(np.vstack([X, W, Y]).T, columns=['X', 'W', 'Y'])
print(df.head(10))
# Create a model
model = CausalModel(
    data=df,
    treatment=['X'],
    outcome=['Y'],
#     common_causes=['Z'],
    graph=graph
)
plt.figure(figsize=(5,5))
model.view_model()
plt.show()

Clearly, the rank is 1 and you can see in the fig below

image

When I perform causal discovery using PC, I get 'ValueError: Data correlation matrix is singular. Cannot run fisherz test. Please check your data.'

Below you can find the code which I'm using to perform causal discovery using PC.

from causallearn.search.ConstraintBased.PC import pc
from causallearn.utils.cit import fisherz
from causallearn.utils.GraphUtils import GraphUtils

# default parameters
cg = pc(df.to_numpy(), 0.05, fisherz)

# visualization using pydot
cg.draw_pydot_graph(labels=df.columns)

# or save the graph
pyd = GraphUtils.to_pydot(cg.G, labels=df.columns)
pyd.write_png('pc_fork.png')

Need help in understanding it, although I think as the data is correlated and singular I'm getting this error, however, how can I resolve this error without adding some random noise in the variables W & Y. Isn't causal discovery possible in my case?

@asha24choudhary
Copy link
Author

asha24choudhary commented Nov 24, 2023

do u think if it was a good idea to calculate pseudo inverse, if the inverse of the sub_corr_matrix gives error?

@WilliamsToTo
Copy link

I also have the same issue when I use causallearn.search.ScoreBased.GES. I guess it is caused by input data. I don't know what kind of requirements should be met. It would be good if developers could list requirements for input data.

@kunwuz
Copy link
Collaborator

kunwuz commented Dec 5, 2023

Yea, this is due to some violation of the data-generating process, e.g., violation of faithfulness. I don't know if any strategy exists to detect this given an observed dataset. The pseudo-inverse could be a good solution in practice, but we need to investigate deeper to see if that would introduce any issue with the asymptotic guarantee.

@kunwuz
Copy link
Collaborator

kunwuz commented Dec 5, 2023

Perhaps adding some small random noises could help?

@priamai
Copy link

priamai commented Dec 18, 2023

Yes and can you check two things:
a) distinct count per column
b) distinct count of identical rows

What I learned with repeated data, it does create singular matrix.
Also interested to learn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants