Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Crash and errors running Tutorial with custom data #175

Open
instabaines opened this issue Oct 6, 2022 · 1 comment
Open

Kernel Crash and errors running Tutorial with custom data #175

instabaines opened this issue Oct 6, 2022 · 1 comment

Comments

@instabaines
Copy link

Description

I am following the 'A first CausalNex tutorial' notebook using a custom dataset JupyterLab. I encountered different issues. I was able to solve some of them.

  • Error : ValueError: The given structure is not acyclic. Please review the following cycle: ('A0', 'A1'), ('A1', 'A0')]
  • Error: #KeyError: 'A0'
  • Kernel crashes on run fit_cpds with train data

Context

I am trying to adapt this to my project

Steps to Reproduce

data:

import numpy as np
import pandas as pd
np.random.seed(123)
data=pd.DataFrame({'A'+str(key):np.random.choice([0,1],size=(1000,)) for key in range(181)})
sm = from_pandas(data,use_gpu=True)

Processes
bn = BayesianNetwork(sm) # this yielded the first error, this was corrected by removing those affected connections
bn = bn.fit_cpds(test, method="BayesianEstimator", bayes_prior="K2") # this yileded the second error, it was fixed by running the fit_node_states on the data prior to this step
Running fit_cpds again crashed the kernel

Expected Result

Expected a similar output to the tutorial

Actual Result

Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.
`

Your Environment

  • CausalNex version used: 0.11.0
  • Python version used: 3.8.13
  • Operating system and version: Ubuntu 20.04.4 LTS, running in an Anaconda virtual environment
@ElisabethSesterHussQB
Copy link
Contributor

Hi, thanks for reaching out!
We took a look at your issue by running the code you provided.
One thing we noticed is that the data you are using might not be best suited when it comes to structure learning as it is very random and therefore the edges all have very small weights. When removing edges using remove_edges_below_threshold we needed to use a threshold around ~0.05 to get rid of enough edges and achieve acyclicity.
One step that should be done is also calling get_largest_subgraph as causalnex does not support separated components for now.
The expected flow in causalnex would then be the following:

sm = from_pandas()
sm.remove... # remove edges that are wrong by manually removing them or applying a threshold
sm.get_largest_subgraph
# discretise data
discretised_data = discretise data
bn = BayesianNetwork(sm)
bn.fit_node_states(discretised_data)
bn.fit_cpds(discretised_data)

Unfortunately, we weren’t able to recreate your second error. We are happy to take a closer look if you are still facing the same issues. Also any additional code you can share with us would be helpful in finding out the exact cause of the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants