Merge pull request #36 from chrisrijsdijk/Arduino3Dsimple

Arduino3 dsimple
chrisrijsdijk · Dec 8, 2023 · 22b31f5 · 22b31f5
2 parents 6e80d69 + 3007c77
commit 22b31f5
Show file tree

Hide file tree

Showing 6 changed files with 171 additions and 99 deletions.
diff --git a/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3Vars.ipynb b/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3Vars.ipynb
@@ -1083,7 +1083,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# connect with Arduino\n",
+    "# connect to Arduino\n",
     "\n",
     "ser = serial.Serial('COM3', 9600, timeout=1) #check whether the Arduino is really connected with COM3 or adjust the code to the correct COM\n",
     "                                             #check at settings and check whether Arduino is a connected device\n",
@@ -1181,7 +1181,7 @@
     "        except:\n",
     "            print('diagnostics doubtful, distance with nearest labelled system state exceeds ',afstand,' Volts')\n",
     "\n",
-    "    eerst=False\n",
+    "    eerst = False\n",
     "    \n",
     "    if keyboard.is_pressed('q'):              # if key 'q' is pressed \n",
     "        print('')\n",

diff --git a/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3VarsVal.ipynb b/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3VarsVal.ipynb
diff --git a/sitetext/CausalInference.ipynb b/sitetext/CausalInference.ipynb
@@ -2,14 +2,20 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "161a1d7c-9f7e-41a3-b388-2cb4c177bc82",
+   "id": "db91549d-268f-4ddd-9c99-f42dd7978dbd",
    "metadata": {},
    "source": [
     "# Causal inference applied to an electric circuit\n",
     "This will be a case of causal inference applied to a simple electric circuit. The primary function of this circuit will be defined by a specific causal relationship, i.e. the position of the switch should *cause* the light to be on or off. Failure modes will *cause* a specific perturbation in the primary function. Statistical associations are generally insufficient to identify causalities. A Directed Acyclic Graph (DAG) will be introduced as a representation of the expert knowledge that should be presumed to identify a causality by a statistical association. This case will *apply* machine learning to detect faults using causal inference. The practical challenges rather than the theoretical foundations will be important here.\n",
     "\n",
-    "This case will firstly introduce the circuit. It will proceed by inferring a causality between the switch and the light. Then, it will introduce a single failure mode that *causes* a perturbation in the primary function. Finally, the case will be generalised to multiple failure modes. The script will revisit the case using extended, realistic time series using a random forest algorithm and k-means clustering.\n",
-    "\n",
+    "This case will firstly introduce the circuit. It will proceed by inferring a causality between the switch and the light. Then, it will introduce a single failure mode that *causes* a perturbation in the primary function. Finally, the case will be generalised to multiple failure modes. The script will revisit the case using extended, realistic time series using a random forest algorithm and k-means clustering."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a79e7323-39ad-4857-85d3-c67df4b448d2",
+   "metadata": {},
+   "source": [
     "\n",
     "## Introduction to the electric circuit\n",
     "\n",
@@ -55,15 +61,40 @@
     "        </tr>\n",
     "    </tbody>    \n",
     "</table>\n",
-    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddab433f-95b4-4b9d-85a0-b326c74c2de8",
+   "metadata": {},
+   "source": [
     "The electric circuit in Figure 1 consists of a light, two resistors, a ground connection and a switch. From this circuit, the following measurements have been recorded: \n",
-    "- the position of the switch (S<sub>1</sub>), \n",
-    "- the voltage V<sub>0</sub> over one resistor and the light,\n",
-    "- the voltage V<sub>1</sub> over one resistor.\n",
+    "- the position of the switch ($S_{1}$), \n",
+    "- the voltage $V_{0}$ over one resistor and the light,\n",
+    "- the voltage $V_{1}$ over one resistor.\n",
     "\n",
+    "As the circuit is healthy, the voltages $V_{0}=0V$ and $V_{1}=0V$ at an open switch ($S_{1}=0$) and the voltages $V_{0}=3,4V$ and $V_{1}=1,4V$ at a closed switch ($S_{1}=1$). As the circuit is faulty, the voltages may take different values. Ideally, the health state of the circuit is entirely identified by the measurements of $S_{1}$, $V_{0}$, and $V_{1}$. Figure 2 shows that the health state of the circuit is *not* determined by the measurements of $S_{1}$, $V_{0}$, and $V_{1}$ because at an open switch $S_{1}=0$ and $V_{0}=0$, and $V_{1}=0$, the circuit may be healthy, but it may also be faulty. Then, the assignment of a health label requires additional knowledge. Generally, the assignment of health labels is challenging."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc4c459d-e9d6-4368-9ec4-e972a40c510c",
+   "metadata": {},
+   "source": [
     "Let us presume that the time series at the right hand side in Figure 1 have been collected by non-experimental research, i.e. the time series just represent part of the circuit's course of operations. \n",
     "\n",
-    "![image.png](figures/CausalInference02.png)\n",
+    "![image.png](figures/CausalInference02a.png) ![image.png](figures/CausalInference02b.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea312b2b-5e2b-4eef-b3ac-7b491798a179",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "\n",
     "\n",
     "Figure 2 shows that these time series cluster. These clusters may indicate the state of the circuit. The state of the circuit will be defined by:\n",
     "- the position of the switch $S_{1}$;\n",
@@ -77,9 +108,33 @@
     "\n",
     "A causality is *not* a statistical association. For example, claiming the statistical independence $Pr(V_{0}|S_{1})=Pr(V_{0})$ is equivalent to claiming the statistical independence $Pr(S_{1}|V_{0})=Pr(S_{1})$. However, claiming that the voltage $V_{0}$ does not cause the switch position $S_{1}$ does not exclude the possibility that the switch position $S_{1}$ causes the voltage $V_{0}$. This example illustrates that a causality is *generally not* equal to a statistical dependency. Still, a causality is *sometimes* equal to a statistical dependency. In the specific case that the causal interactions can be represented in a DAG, Judea Pearl showed how to identify a causality from a statistical dependency. This is an important result because a statistical dependency can be estimated from observed frequencies.\n",
     "\n",
-    "A final technical note, only statistical expectations $E[.]$ rather than full probability distributions $Pr(.)$ will be estimated in the case below.\n",
+    "A final technical note, only statistical expectations $E[.]$ rather than full probability distributions $Pr(.)$ will be estimated in the case below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bf33dc6-43fe-4cd0-909d-dc2be03a14be",
+   "metadata": {},
+   "source": [
+    "Figure 2 shows that these time series cluster. These clusters may indicate the state of the circuit. The state of the circuit will be defined by:\n",
+    "- the position of the switch $S_{1}$;\n",
+    "- the presence or absence of four failure modes $F_{1},F_{2},F_{3},F_{4}$.\n",
+    "These elements $S_{1},F_{1},F_{2},F_{3},F_{4}$ are thought to be *causes* of the voltages. So, a decision maker who *cannot* directly observe the failure modes $F_{1},F_{2},F_{3},F_{4}$, may still learn about $F_{1},F_{2},F_{3},F_{4}$ by observing their (voltage) effect. Therefore, the clusters in Figure 2 may indicate the hidden state of the circuit.\n",
     "\n",
+    "In the absence of any failure modes, the switch $S_{1}$ should *cause* a very specific voltage $V_{0}$. This causality will be represented:\n",
+    "- by the Directed Acyclic Graph (DAG) $S_{1} \\rightarrow V_{0}$, or \n",
+    "- by the difference in potential outcomes (Rubin) $V_{0}(S_{1}=1)- V_{0}(S_{1}=0)$. \n",
     "\n",
+    "A causality is *not* a statistical association. For example, claiming the statistical independence $Pr(V_{0}|S_{1})=Pr(V_{0})$ is equivalent to claiming the statistical independence $Pr(S_{1}|V_{0})=Pr(S_{1})$. However, claiming that the voltage $V_{0}$ does not cause the switch position $S_{1}$ does not exclude the possibility that the switch position $S_{1}$ causes the voltage $V_{0}$. This example illustrates that a causality is *generally not* equal to a statistical dependency. Still, a causality is *sometimes* equal to a statistical dependency. In the specific case that the causal interactions can be represented in a DAG, Judea Pearl showed how to identify a causality from a statistical dependency. This is an important result because a statistical dependency can be estimated from observed frequencies.\n",
+    "\n",
+    "A final technical note, only statistical expectations $E[.]$ rather than full probability distributions $Pr(.)$ will be estimated in the case below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19b2aa37-44c2-4ce9-b1e7-ede8f62c0f66",
+   "metadata": {},
+   "source": [
     "## Example of a causal inference\n",
     "\n",
     "This Section will infer a causality from the time series in Figure 1 together with the presumed DAG $V_{0} \\leftarrow S_{1} \\rightarrow V_{1}$. This DAG implies that the position of the switch $S_{1}$ *causes* the voltages. The direct assessment of a causality is problematic as the counterfactual reality remains unobserved. For example, the time series show that the switch $S_{1}$ closed at a time $t=5$, but they do not show what would have happened if the switch were *not* closed at the time $t=5$. Therefore, the effect of that specific closure of the switch is *not* directly observable. So, the time series just list the values that *occurred* but not the values that *could have occurred* had the switch been in another position. The challenge now is to infer the causal effect of an intervention on the switch. \n",
@@ -100,9 +155,14 @@
     "\n",
     "$ E[V_{0}|S_{1}=1]-E[V_{0}|S_{1}=0] = 3,2V \\;\\;\\;\\;\\;\\;\\;-^{DAG}\\rightarrow\\;\\;\\;\\;\\;\\;\\;  ATE= E[V_{0}(S_{1}=1)]-E[V_{0}(S_{1}=0)] = 3,2V $\n",
     "\n",
-    "The ATE of the switch $S_{1}$ on the voltage $V_{0}$ indicates a *causal* response to the switch. This move from a difference in expectations to an ATE is *not* trivial. For an in depth explanation on how to identify an ATE from statistical associations, a textbook on causal inference should be consulted. Here, it is just stated as a fact that the additional knowledge represented by the DAG $V_{0} \\leftarrow S_{1} \\rightarrow V_{1}$ is needed to identify the ATE of the switch $S_{1}$ on the voltage $V_{0}$. This knowledge may seem evident to those who took classes in electronics and one may design various experiments to substantiate this DAG, but the DAG does *not* follow from the observed time series in Figure 1. For example, the DAG $V_{1} \\leftarrow V_{0}\\rightarrow S_{1}$ may well produce the same time series but an *intervention* on the switch $S_{1}$ would not change the voltages then. Therefore, the time series by themselves do not suffice to identify the ATE of the switch. Presumptions represented in the DAG are essential.\n",
-    "\n",
-    "\n",
+    "The ATE of the switch $S_{1}$ on the voltage $V_{0}$ indicates a *causal* response to the switch. This move from a difference in expectations to an ATE is *not* trivial. For an in depth explanation on how to identify an ATE from statistical associations, a textbook on causal inference should be consulted. Here, it is just stated as a fact that the additional knowledge represented by the DAG $V_{0} \\leftarrow S_{1} \\rightarrow V_{1}$ is needed to identify the ATE of the switch $S_{1}$ on the voltage $V_{0}$. This knowledge may seem evident to those who took classes in electronics and one may design various experiments to substantiate this DAG, but the DAG does *not* follow from the observed time series in Figure 1. For example, the DAG $V_{1} \\leftarrow V_{0}\\rightarrow S_{1}$ may well produce the same time series but an *intervention* on the switch $S_{1}$ would not change the voltages then. Therefore, the time series by themselves do not suffice to identify the ATE of the switch. Presumptions represented in the DAG are essential.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df04508b-1e91-4358-a933-397864b17252",
+   "metadata": {},
+   "source": [
     "## Example of causal inference extended with a single failure mode\n",
     "\n",
     "A more careful analysis of the time series in Figure 1 reveals that the voltages at a closed switch $S_{1}=1$ cluster differently. In particular, the voltages during the time interval $t=[41:49]$ differ from the other ones. A *causal explanation* is desirable for this anomalous behaviour. As constant causes cannot reveal their effects, this anomalous behaviour of the voltages cannot be explained by the switch position, there must be an unobserved background variable $B$ at work. This Section will extend the example with the background variable $B$ presuming a DAG that implies that both $S_{1}$ and $B$ are confounders of the voltages. So, the extended DAG is:\n",
@@ -177,8 +237,14 @@
     "$ E[V_{0}|S_{1}=1,B=0]-E[V_{0}|S_{1}=0,B=0] = 3,44V   \\;\\;\\;\\;\\;\\;\\;-^{DAG}\\rightarrow\\;\\;\\;\\;\\;\\;\\;   ATE_{@B=0} = E[V_{0}(S_{1}=1)|B=0]-E[V_{0}(S_{1}=0)|B=0]=3,44V  $\n",
     "$ E[V_{0}|S_{1}=1,B=1]-E[V_{0}|S_{1}=0,B=1] = 2,50V   \\;\\;\\;\\;\\;\\;\\;-^{DAG}\\rightarrow\\;\\;\\;\\;\\;\\;\\;   ATE_{@B=1} = E[V_{0}(S_{1}=1)|B=1]-E[V_{0}(S_{1}=0)|B=1]=2,50V  $\n",
     "\n",
-    "Up until now, the time series $B$ remained an unobserved hidden variable. But an engineer may identify the anomalous behaviour at $t=[41:49]$ as a short circuit of the light. Eventually, the occurrence of this hidden failure mode has been recorded which enables verification. Generally, any causal inference from observations is vulnerable for unobserved background variables as observations are incomplete. The presumed DAG may be highly controversial in some cases, but at least it is a precise description of the expert knowledge required to causaly explain this statistical association.\n",
-    "\n",
+    "Up until now, the time series $B$ remained an unobserved hidden variable. But an engineer may identify the anomalous behaviour at $t=[41:49]$ as a short circuit of the light. Eventually, the occurrence of this hidden failure mode has been recorded which enables verification. Generally, any causal inference from observations is vulnerable for unobserved background variables as observations are incomplete. The presumed DAG may be highly controversial in some cases, but at least it is a precise description of the expert knowledge required to causaly explain this statistical association.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df581262-db18-41ab-9f6a-40d7b0ad8692",
+   "metadata": {},
+   "source": [
     "\n",
     "## Example of causal inference extended with multiple failure modes\n",
     "\n",
@@ -217,8 +283,14 @@
     "    <tr>\n",
     "        <td> $2,5$     <td> $2,5$     <td> $1$     <td> $0$           <td> $0$           <td> $1$           <td> $0$\n",
     "    </tr>\n",
-    "</table>\n",
-    "\n",
+    "</table>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e3baaa9-8156-47dc-bbbb-c0feab8dcf36",
+   "metadata": {},
+   "source": [
     "Evidently, the Table above should comprise 32 rows to be complete implying that the time series in the example do not cover all states of the circuit. In practice, time series collected from a system's normal course of operations are similarly incomplete. This precludes a data driven estimation of an ATE. Still, To assess a specific conditional ATE from the first two columns of the Table, a DAG should be presumed. In particular, to assess the conditional ATE of the switch $S_{1}$ on a voltage:\n",
     "\n",
     "$E[V_{x}(S_{1}=1)|F_{1},F_{2},F_{3},F_{4}] - E[V_{x}(S_{1}=0)|F_{1},F_{2},F_{3},F_{4}]$\n",
@@ -232,7 +304,7 @@
     "The assessment of the fault state $F_{1}F_{2}F_{3}F_{4}$ in the Table above remained undiscussed up until now. If each fault state $F_{1}F_{2}F_{3}F_{4}$ were to biject with the voltages and the switch position $V_{0}V_{1}S_{1}$, *autonomous online* fault detection would have been straightforward. The Table above already showed that an open switch $S_{1}=0$ maps to several fault states. Therefore, the triplet $V_{0}V_{1}S_{1}$ is insufficient to identify the fault state. Still, knowledge of the triplet $V_{0}V_{1}S_{1}$ constrains the possible fault states $F_{1}F_{2}F_{3}F_{4}$ and contribute to *expert based online* fault detection. If the circuit were to be taken *offline*, it is often possible to control the triplet $V_{0}V_{1}S_{1}$ which often enables a further reduction of the possible fault states $F_{1}F_{2}F_{3}F_{4}$.\n",
     "\n",
     "\n",
-    "# [Click here to see the script](https://nbviewer.jupyter.org/github/chrisrijsdijk/RAMS/blob/master/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3VarsVal.ipynb) \n"
+    "# [Click here to see the script](https://nbviewer.jupyter.org/github/chrisrijsdijk/RAMS/blob/master/notebook/Arduino_3Vars/Arduino_diagnostics_ensemble3VarsVal.ipynb) "
    ]
   },
   {

diff --git a/sitetext/figures/.ipynb_checkpoints/CausalInference02b-checkpoint.png b/sitetext/figures/.ipynb_checkpoints/CausalInference02b-checkpoint.png
diff --git a/sitetext/figures/CausalInference02a.png b/sitetext/figures/CausalInference02a.png
diff --git a/sitetext/figures/CausalInference02b.png b/sitetext/figures/CausalInference02b.png