Skip to content

Commit 97ba0f9

Browse files
committed
Add learning
1 parent 67420b1 commit 97ba0f9

8 files changed

+113
-2
lines changed

Pavel Lučivňák BP.tex

Lines changed: 77 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
\usepackage[utf8]{inputenc} % LaTeX source encoded as UTF-8
1212

1313
\usepackage{graphicx} %graphics files inclusion
14-
% \usepackage{amsmath} %advanced maths
14+
\usepackage{amsmath} %advanced maths
15+
\DeclareMathOperator*{\argmax}{argmax} % thin space, limits underneath in displays
1516
% \usepackage{amssymb} %additional math symbols
1617

1718
\usepackage{dirtree} %directory tree visualisation
@@ -76,7 +77,7 @@ \subsection{Description}
7677
\centering
7778
\includegraphics[width=0.7\textwidth]{hmm}
7879
\caption{Visualization of a Hidden Markov Model. At each time point, there is a hidden state $s_i$ and an observed state $x_i$.}
79-
\label{fig:float}
80+
\label{fig:hmm}
8081
\end{figure}
8182

8283
The aforementioned functions return a probability. This means the resulting number lies in interval $[0, 1]$ and the sum of returned numbers for all function parameters adds to 1.
@@ -114,6 +115,80 @@ \subsubsection{Inference}
114115

115116
\subsubsection{Scoring?}
116117

118+
\subsection{Learning}
119+
120+
\subsubsection{Maximum likelihood estimation}
121+
122+
This method uses maximum likelihood estimation (MLE) to compute model parameters. MLE is a technique for finding parameters of a probabilistic model that best describe behavior of a random variable. The method aims to maximize the likelihood of all the learning data.
123+
124+
Formally $\argmax_{\theta \in \Theta} \prod_{j=1}^{M} L(x_j, \theta)$, where:
125+
126+
\begin{itemize}
127+
128+
\item $M$ is number of data points
129+
\item $x_j$ is a j-th data point
130+
\item \emph{L} is a likelihood function (defined further)
131+
\item $\theta$ describes model parameters
132+
\item $\Theta$ is a set of all model parameters
133+
134+
\end{itemize}
135+
136+
Let's examine how one can view parameters of HMM as probabilistic functions of random variables.
137+
138+
\paragraph{Initial transition function}
139+
140+
Function \emph{I}: $S \mapsto \textbf R$ is in fact a probability function corresponding to random variable \emph{S}. To represent it, let's use a discrete probabilistic model with parameter $\epsilon$.
141+
142+
\paragraph{Transition function}
143+
144+
Since the function is $S \times S \mapsto \textbf R$, one can think about it the following way. For each hidden state $s \in S$, there is a probability function $S \mapsto \textbf R$. We can represent such a function in the same way as the initial transition function. Therefore for each hidden state $s \in S$, there is a discrete probabilistic model with parameter $\epsilon$.
145+
146+
\paragraph{Emission function}
147+
148+
Since \emph{E}: $S \times X \mapsto \textbf R$, one can think about it this way. For each hidden state $s \in S$, there is a probability function $X \mapsto \textbf R$. This function can be represented by a continuous probabilistic model. For purposed of this thesis, it is assumed that continuous random variables have Gaussian or Gaussian mixture distribution.
149+
150+
\paragraph{Discrete random variables}
151+
152+
Here I present a straightforward way of defining the $L$ likelihood function in case of discrete random variables. $L(x,\theta) = x_{cnt} / M$, where $x_{cnt}$ is the number of times $x$ occurs in learning data. The number of data points is $M$. Since the likelihood function does not depend on parameter $\theta$, there is no expression to optimize.
153+
154+
There is a reason I did not choose any standard discrete probability distribution to define $L$. The random variable does not have to be \textbf Z, nor \textbf N. In fact it can be any abstract object, such as an animal. A type, where comparison between two objects doesn't make sense. Therefore, it wouldn't make sense to assign non zero probability to values that are not specified in learning phase.
155+
156+
As an example, consider the following data: \{dog, bird, dog, cat, bird, dog\}. Figure \ref{fig:discrete_mle_prob} shows a likelihood function $L$ associated with the data.
157+
158+
% TODO: consider two figures side by side
159+
\begin{figure}
160+
\centering
161+
\includegraphics[width=0.7\textwidth]{discrete_mle_hist}
162+
\caption{Frequency of data in discrete data set.}
163+
\label{fig:discrete_mle_hist}
164+
\end{figure}
165+
166+
\begin{figure}
167+
\centering
168+
\includegraphics[width=0.7\textwidth]{discrete_mle_prob}
169+
\caption{Likelihood function $L$ of data in discrete data set. Probability is zero at undefined states.}
170+
\label{fig:discrete_mle_prob}
171+
\end{figure}
172+
173+
\paragraph{Continuous random variables}
174+
% TODO: equations should be on separate lines
175+
Let's consider Gaussian (normal) distribution, defined by $\theta = \{\mu, \sigma^2\}$. The goal is to find parameter $\theta \in \Theta$, such that the product $\prod_{j=1}^{M} L(x_j, \theta)$ is maximized. In case of Gaussian distribution, the likelihood function $L$ is defined by $\frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2 \sigma^2}}$.
176+
177+
Taking derivative with respect to $\mu$ equal to 0 yields $\sum_{j=1}^{M}{x_j - M \mu} = 0$. Maximum likelihood estimate of $\mu$ is therefore equal to $\overline X_M$.
178+
179+
Derivative with respect to $\sigma^2$ equal to zero results in MLE of $\sigma^2$ to be equal to $\frac{1}{M} \sum_{j=1}^{M} {(X_j-\overline X_M)^2}$.
180+
181+
Figure \ref{fig:normal_mle} shows an example of MLE on normally distributed random variable.
182+
183+
\begin{figure}
184+
\centering
185+
\includegraphics[width=0.7\textwidth]{normal_mle}
186+
\caption{Histogram of randomly generated data from normal distribution $\mathcal{N}(40,32^2)$. The green curve is a plot of normal distribution with the maximum likelihood estimate of $\theta$ parameters.}
187+
\label{fig:normal_mle}
188+
\end{figure}
189+
190+
TODO: Gaussian mixtures
191+
117192
\chapter{Návrh}
118193

119194
\chapter{Realizace}

discrete_mle.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Animal Count Probability
2+
Dog 3 0.5
3+
Bird 2 0.333333
4+
Cat 1 0.166667

discrete_mle.gp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
set term pdf
2+
set boxwidth 0.3
3+
set style fill solid
4+
set key autotitle columnhead
5+
unset key
6+
7+
set output 'discrete_mle_hist.pdf'
8+
set ylabel 'Count'
9+
set yrange [0:1]
10+
set autoscale ymax
11+
plot 'discrete_mle.csv' using ($2):xtic(1) with boxes
12+
13+
14+
set output 'discrete_mle_prob.pdf
15+
set ylabel 'Probability'
16+
set yrange [0:1]
17+
plot 'discrete_mle.csv' using ($3):xtic(1) with boxes

discrete_mle_hist.pdf

6.9 KB
Binary file not shown.

discrete_mle_prob.pdf

7.53 KB
Binary file not shown.

discrete_mle_update_csv.bash

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# This script updates the Probability column
2+
DATA_FILE=discrete_mle.csv
3+
TEMP_FILE=tmp.csv
4+
N=`cat $DATA_FILE | tail -n +2 | awk '{sum += $2} END {print sum}'`
5+
head=`cat $DATA_FILE| head -n 1`
6+
echo $head > $TEMP_FILE
7+
cat discrete_mle.csv | tail -n +2 | awk "{print \$1, \$2, \$2/$N}" >> $TEMP_FILE
8+
mv $TEMP_FILE $DATA_FILE

normal_mle.R

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
library(MASS)
2+
pdf("normal_mle.pdf")
3+
count <- 100;
4+
data <- rnorm(count, mean = 40, sd = 32);
5+
est <- fitdistr(data, densfun="normal")$estimate;
6+
hist(data, freq = F, breaks = 10);
7+
curve(dnorm(x, est[1], est[2]), col="green", lwd=2, add=T);

normal_mle.pdf

7.71 KB
Binary file not shown.

0 commit comments

Comments
 (0)