Integrate `pgmpy` for Bayesian networks capabilities #47

ceteri · 2020-12-22T09:48:02Z

Integrated pgmpy for statistical inference in Bayesian networks.

Depends on: #26

The text was updated successfully, but these errors were encountered:

Ankush-Chander · 2021-04-25T14:43:04Z

Hey @ceteri,

I need some pointers to understand this requirement better.

Thanks in advance.

ceteri · 2021-04-25T21:09:28Z

Thank you @Ankush-Chander!
Here's an idea, if this seems reasonable as an approach?

There are several kinds of modeling, sampling, and inference implemented by pgmpy, although probably our shortest path is for focusing on Discrete Bayesian? This is also one of the top-requested features to add to kglab from our ongoing survey.

Next steps are:

Build an example Discrete Bayesian model in pgmpy which produces known results – which we can use to verify the integration later
- for example, using one of the examples given in their documentation
- or, ideally, based on data in the recipe progressive example that we use
Represent the data from this model in an RDF graph
Develop a new class method for kglab.KnowledgeGraph or probably even better for kglab.Subgraph that loads the pgmpy model data from the KG
Verify results from above, to use as a unit test

We can also decide whether to have some additional wrappers for pgmpy and its results. On the one hand, it's great to wrap results into pandas dataframes and other conveniences for data science workflows. On the other hand, it's probably better to allow people to simply use pgmpy operations on the model directly. The latter approach is how we've handled integration of PyTorch, PyVis, etc., i.e., not to intermediate unless there are pain points that need to be corrected (as in SPARQL queries).

How does that sound as an approach?

Ankush-Chander · 2021-04-30T15:36:43Z

Hey @ceteri

I tried to follow above trail but I was not able to find any widely accepted standard rdf representation of bayesian networks. Will need your help in that.

Once we pinpoint that we can provide user a pathway to move from a standard bn rdf file to kg to pgmpy model. Rest of the operation can be done directly using pgmpy endpoints.

Thanks

ceteri · 2021-05-07T06:02:31Z

Hi @Ankush-Chander, good point! The way I described it above, moving from RDF => pgmpy wouldn't work directly, and there's not standard representation.

What I should have described better:

Choose a simple example Bayesian network problem
Build a solution for it in pgmpy, so we have a known baseline to test against
At that point, I'll represent in RDF (as idiomatic as possible; this becomes simpler after RDF-star is available)
Then we can scope how best to use the Subgraph classes to transform into pgmpy

If the selected example problem can involve the "progressive example" of recipes used in the tutorial, that would be ideal. Although that's not necessary first for us to build out an integration. The initial test case should be simple, as the priority. We can always construct recipe examples later :)

Does that describe the problem better?

The intention for this is to illustrate how to use a completely different graph technology (Bayesian networks) on graph data, which can complement the other approaches we have with NetworkX, RDFlib, pslpython, PyTorch, etc.

Many thanks,
Paco

Ankush-Chander · 2021-05-16T14:31:50Z

Hey @ceteri,

Took a while to get my head around Bayesian inferencing.

Here"s the test example.

P.S: Original cancer model although simple made some very gloomy assumptions, so I had to choose something positive :)
I hope it"s simple enough for our purpose

3. At that point, I'll represent in RDF (as idiomatic as possible; this becomes simpler after RDF-star is available)

Any pointers on step 3 will be helpful for me to continue.

Thanks in advance,
Ankush

ceteri · 2021-05-19T00:34:49Z

Wonderful, thank you @Ankush-Chander !

Now I get to wrangle with some RDF representation, hopefully with not too much reification required :)

ceteri added the enhancement New feature or request label Dec 22, 2020

ceteri added this to the Release 0.2.x milestone Dec 22, 2020

ceteri added this to To do in kglab Dec 22, 2020

ceteri added the good first issue Good for newcomers label Mar 5, 2021

ceteri modified the milestones: Release 0.3.x, Release 0.4.x Mar 7, 2021

ceteri removed this from the Release 0.4.x milestone May 10, 2021

ceteri moved this from To do to In progress in kglab May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate `pgmpy` for Bayesian networks capabilities #47

Integrate `pgmpy` for Bayesian networks capabilities #47

ceteri commented Dec 22, 2020

Ankush-Chander commented Apr 25, 2021

ceteri commented Apr 25, 2021

Ankush-Chander commented Apr 30, 2021

ceteri commented May 7, 2021

Ankush-Chander commented May 16, 2021

ceteri commented May 19, 2021

Integrate pgmpy for Bayesian networks capabilities #47

Integrate pgmpy for Bayesian networks capabilities #47

Comments

ceteri commented Dec 22, 2020

Ankush-Chander commented Apr 25, 2021

ceteri commented Apr 25, 2021

Ankush-Chander commented Apr 30, 2021

ceteri commented May 7, 2021

Ankush-Chander commented May 16, 2021

ceteri commented May 19, 2021

Integrate `pgmpy` for Bayesian networks capabilities #47

Integrate `pgmpy` for Bayesian networks capabilities #47