|
11 | 11 | \usepackage[utf8]{inputenc} % LaTeX source encoded as UTF-8
|
12 | 12 |
|
13 | 13 | \usepackage{graphicx} %graphics files inclusion
|
14 |
| -% \usepackage{amsmath} %advanced maths |
| 14 | +\usepackage{amsmath} %advanced maths |
| 15 | +\DeclareMathOperator*{\argmax}{argmax} % thin space, limits underneath in displays |
15 | 16 | % \usepackage{amssymb} %additional math symbols
|
16 | 17 |
|
17 | 18 | \usepackage{dirtree} %directory tree visualisation
|
@@ -76,7 +77,7 @@ \subsection{Description}
|
76 | 77 | \centering
|
77 | 78 | \includegraphics[width=0.7\textwidth]{hmm}
|
78 | 79 | \caption{Visualization of a Hidden Markov Model. At each time point, there is a hidden state $s_i$ and an observed state $x_i$.}
|
79 |
| - \label{fig:float} |
| 80 | + \label{fig:hmm} |
80 | 81 | \end{figure}
|
81 | 82 |
|
82 | 83 | The aforementioned functions return a probability. This means the resulting number lies in interval $[0, 1]$ and the sum of returned numbers for all function parameters adds to 1.
|
@@ -114,6 +115,80 @@ \subsubsection{Inference}
|
114 | 115 |
|
115 | 116 | \subsubsection{Scoring?}
|
116 | 117 |
|
| 118 | +\subsection{Learning} |
| 119 | + |
| 120 | +\subsubsection{Maximum likelihood estimation} |
| 121 | + |
| 122 | +This method uses maximum likelihood estimation (MLE) to compute model parameters. MLE is a technique for finding parameters of a probabilistic model that best describe behavior of a random variable. The method aims to maximize the likelihood of all the learning data. |
| 123 | + |
| 124 | +Formally $\argmax_{\theta \in \Theta} \prod_{j=1}^{M} L(x_j, \theta)$, where: |
| 125 | + |
| 126 | +\begin{itemize} |
| 127 | + |
| 128 | +\item $M$ is number of data points |
| 129 | +\item $x_j$ is a j-th data point |
| 130 | +\item \emph{L} is a likelihood function (defined further) |
| 131 | +\item $\theta$ describes model parameters |
| 132 | +\item $\Theta$ is a set of all model parameters |
| 133 | + |
| 134 | +\end{itemize} |
| 135 | + |
| 136 | +Let's examine how one can view parameters of HMM as probabilistic functions of random variables. |
| 137 | + |
| 138 | +\paragraph{Initial transition function} |
| 139 | + |
| 140 | +Function \emph{I}: $S \mapsto \textbf R$ is in fact a probability function corresponding to random variable \emph{S}. To represent it, let's use a discrete probabilistic model with parameter $\epsilon$. |
| 141 | + |
| 142 | +\paragraph{Transition function} |
| 143 | + |
| 144 | +Since the function is $S \times S \mapsto \textbf R$, one can think about it the following way. For each hidden state $s \in S$, there is a probability function $S \mapsto \textbf R$. We can represent such a function in the same way as the initial transition function. Therefore for each hidden state $s \in S$, there is a discrete probabilistic model with parameter $\epsilon$. |
| 145 | + |
| 146 | +\paragraph{Emission function} |
| 147 | + |
| 148 | +Since \emph{E}: $S \times X \mapsto \textbf R$, one can think about it this way. For each hidden state $s \in S$, there is a probability function $X \mapsto \textbf R$. This function can be represented by a continuous probabilistic model. For purposed of this thesis, it is assumed that continuous random variables have Gaussian or Gaussian mixture distribution. |
| 149 | + |
| 150 | +\paragraph{Discrete random variables} |
| 151 | + |
| 152 | +Here I present a straightforward way of defining the $L$ likelihood function in case of discrete random variables. $L(x,\theta) = x_{cnt} / M$, where $x_{cnt}$ is the number of times $x$ occurs in learning data. The number of data points is $M$. Since the likelihood function does not depend on parameter $\theta$, there is no expression to optimize. |
| 153 | + |
| 154 | +There is a reason I did not choose any standard discrete probability distribution to define $L$. The random variable does not have to be \textbf Z, nor \textbf N. In fact it can be any abstract object, such as an animal. A type, where comparison between two objects doesn't make sense. Therefore, it wouldn't make sense to assign non zero probability to values that are not specified in learning phase. |
| 155 | + |
| 156 | +As an example, consider the following data: \{dog, bird, dog, cat, bird, dog\}. Figure \ref{fig:discrete_mle_prob} shows a likelihood function $L$ associated with the data. |
| 157 | + |
| 158 | +% TODO: consider two figures side by side |
| 159 | +\begin{figure} |
| 160 | + \centering |
| 161 | + \includegraphics[width=0.7\textwidth]{discrete_mle_hist} |
| 162 | + \caption{Frequency of data in discrete data set.} |
| 163 | + \label{fig:discrete_mle_hist} |
| 164 | +\end{figure} |
| 165 | + |
| 166 | +\begin{figure} |
| 167 | + \centering |
| 168 | + \includegraphics[width=0.7\textwidth]{discrete_mle_prob} |
| 169 | + \caption{Likelihood function $L$ of data in discrete data set. Probability is zero at undefined states.} |
| 170 | + \label{fig:discrete_mle_prob} |
| 171 | +\end{figure} |
| 172 | + |
| 173 | +\paragraph{Continuous random variables} |
| 174 | +% TODO: equations should be on separate lines |
| 175 | +Let's consider Gaussian (normal) distribution, defined by $\theta = \{\mu, \sigma^2\}$. The goal is to find parameter $\theta \in \Theta$, such that the product $\prod_{j=1}^{M} L(x_j, \theta)$ is maximized. In case of Gaussian distribution, the likelihood function $L$ is defined by $\frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2 \sigma^2}}$. |
| 176 | + |
| 177 | +Taking derivative with respect to $\mu$ equal to 0 yields $\sum_{j=1}^{M}{x_j - M \mu} = 0$. Maximum likelihood estimate of $\mu$ is therefore equal to $\overline X_M$. |
| 178 | + |
| 179 | +Derivative with respect to $\sigma^2$ equal to zero results in MLE of $\sigma^2$ to be equal to $\frac{1}{M} \sum_{j=1}^{M} {(X_j-\overline X_M)^2}$. |
| 180 | + |
| 181 | +Figure \ref{fig:normal_mle} shows an example of MLE on normally distributed random variable. |
| 182 | + |
| 183 | +\begin{figure} |
| 184 | + \centering |
| 185 | + \includegraphics[width=0.7\textwidth]{normal_mle} |
| 186 | + \caption{Histogram of randomly generated data from normal distribution $\mathcal{N}(40,32^2)$. The green curve is a plot of normal distribution with the maximum likelihood estimate of $\theta$ parameters.} |
| 187 | + \label{fig:normal_mle} |
| 188 | +\end{figure} |
| 189 | + |
| 190 | +TODO: Gaussian mixtures |
| 191 | + |
117 | 192 | \chapter{Návrh}
|
118 | 193 |
|
119 | 194 | \chapter{Realizace}
|
|
0 commit comments