A common frustration in the industry, especially when it comes to getting business insights from tabular data, is that the most interesting questions (from their perspective) are often not answerable with observational data alone.
These questions can be similar to:
“What will happen if I halve the price of my product?”
“Which clients will pay their debts only if I call them?”
In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process. Causal graphs can be used for communication and for inference.
Causal models are mathematical models representing causal relationships within an individual system or population. They facilitate inferences about causal relationships from statistical data. They can teach us a good deal about the epistemology of causation, and about the relationship between causation and probability.
The first thing to do is to understand our data. We will be using a Breast cancer dataset in this causal inference demo. This requires us to understand a bit about the data, Breast cancer, and the diagnosis process. The first application to breast cancer diagnosis utilizes characteristics of individual cells obtained from a minimally invasive fine needle aspirate(FNA). Allows an accurate diagnosis and also constructs a surface that predicts when breast cancer is likely to recur.
These questions can be similar to:
“What will happen if I halve the price of my product?”
“Which clients will pay their debts only if I call them?”
clone this repository
git clone https://github.com/Azariagmt/Causality/
Install requirements
pip install -r requirements.txt
Run experiment
what experiment.py does is it starts a new mlflow experiment which pulls data from the DVC gdrive remote and starts logging essential metrics and drawing causality graphs
cd scripts
python experiment.py