mirror of
https://github.com/MartinThoma/LaTeX-examples.git
synced 2025-04-26 06:48:04 +02:00
Add papers/write-math-paper
This commit is contained in:
parent
7740f0147f
commit
fe78311901
25 changed files with 10624 additions and 0 deletions
113
documents/papers/write-math-paper/ch4-algorithms.tex
Normal file
113
documents/papers/write-math-paper/ch4-algorithms.tex
Normal file
|
@ -0,0 +1,113 @@
|
|||
%!TEX root = write-math-ba-paper.tex
|
||||
|
||||
\section{Algorithms}
|
||||
\subsection{Preprocessing}\label{sec:preprocessing}
|
||||
Preprocessing in symbol recognition is done to improve the quality and
|
||||
expressive power of the data. It makes follow-up tasks like feature extraction
|
||||
and classification easier, more effective or faster. It does so by resolving
|
||||
errors in the input data, reducing duplicate information and removing
|
||||
irrelevant information.
|
||||
|
||||
Preprocessing algorithms fall into two groups: Normalization and noise
|
||||
reduction algorithms.
|
||||
|
||||
A very important normalization algorithm in single-symbol recognition is
|
||||
\textit{scale-and-shift}~\cite{Thoma:2014}. It scales the recording so that
|
||||
its bounding box fits into a unit square. As the aspect ratio of a recording is
|
||||
almost never 1:1, only one dimension will fit exactly in the unit square. For
|
||||
this paper, it was chosen to shift the recording in the direction of its bigger
|
||||
dimension into the $[0,1] \times [0,1]$ unit square. After that, the recording
|
||||
is shifted in direction of its smaller dimension such that its bounding box is
|
||||
centered around zero.
|
||||
|
||||
Another normalization preprocessing algorithm is
|
||||
resampling~\cite{Guyon91,Manke01}. As the data points on the pen trajectory are
|
||||
generated asynchronously and with different time-resolutions depending on the
|
||||
used hardware and software, it is desirable to resample the recordings to have
|
||||
points spread equally in time for every recording. This was done by linear
|
||||
interpolation of the $(x,t)$ and $(y,t)$ sequences and getting a fixed number
|
||||
of equally spaced points per stroke.
|
||||
|
||||
\textit{Stroke connection} is a noise reduction algorithm which is mentioned
|
||||
in~\cite{Tappert90}. It happens sometimes that the hardware detects that the
|
||||
user lifted the pen where the user certainly didn't do so. This can be detected
|
||||
by measuring the Euclidean distance between the end of one stroke and the
|
||||
beginning of the next stroke. If this distance is below a threshold, then the
|
||||
strokes are connected.
|
||||
|
||||
Due to a limited resolution of the recording device and due to erratic
|
||||
handwriting, the pen trajectory might not be smooth. One way to smooth is
|
||||
calculating a weighted average and replacing points by the weighted average of
|
||||
their coordinate and their neighbors coordinates. Another way to do smoothing
|
||||
is to reduce the number of points with the Douglas-Peucker
|
||||
algorithm to the points that are more relevant for the
|
||||
overall shape of a stroke and then interpolate the stroke between those points.
|
||||
The Douglas-Peucker stroke simplification algorithm is usually used in
|
||||
cartography to simplify the shape of roads. It works recursively to find a
|
||||
subset of points of a stroke that is simpler and still similar to the original
|
||||
shape. The algorithm adds the first and the last point $p_1$ and $p_n$ of a
|
||||
stroke to the simplified set of points $S$. Then it searches the point $p_i$ in
|
||||
between that has maximum distance from the line $p_1 p_n$. If this distance is
|
||||
above a threshold $\varepsilon$, the point $p_i$ is added to $S$. Then the
|
||||
algorithm gets applied to $p_1 p_i$ and $p_i p_n$ recursively. It is described
|
||||
as \enquote{Algorithm 1} in~\cite{Visvalingam1990}.
|
||||
|
||||
\subsection{Features}\label{sec:features}
|
||||
Features can be \textit{global}, that means calculated for the complete
|
||||
recording or complete strokes. Other features are calculated for single points
|
||||
on the pen trajectory and are called \textit{local}.
|
||||
|
||||
Global features are the \textit{number of strokes} in a recording, the
|
||||
\textit{aspect ratio} of a recordings bounding box or the
|
||||
\textit{ink} being used for a recording. The ink feature gets calculated by
|
||||
measuring the length of all strokes combined. The re-curvature, which was
|
||||
introduced in~\cite{Huang06}, is defined as
|
||||
\[\text{re-curvature}(stroke) := \frac{\text{height}(stroke)}{\text{length}(stroke)}\]
|
||||
and a stroke-global feature.
|
||||
|
||||
The simplest local feature is the coordinate of the point itself. Speed,
|
||||
curvature and a local small-resolution bitmap around the point, which was
|
||||
introduced by Manke, Finke and Waibel in~\cite{Manke1995}, are other local
|
||||
features.
|
||||
|
||||
\subsection{Multilayer Perceptrons}\label{sec:mlp-training}
|
||||
\Glspl{MLP} are explained in detail in~\cite{Mitchell97}. They can have
|
||||
different numbers of hidden layers, the number of neurons per layer and the
|
||||
activation functions can be varied. The learning algorithm is parameterized by
|
||||
the learning rate $\eta \in (0, \infty)$, the momentum $\alpha \in [0, \infty)$
|
||||
and the number of epochs.
|
||||
|
||||
The topology of \glspl{MLP} will be denoted in the following by separating the
|
||||
number of neurons per layer with colons. For example, the notation
|
||||
$160{:}500{:}500{:}500{:}369$ means that the input layer gets 160~features,
|
||||
there are three hidden layers with 500~neurons per layer and one output layer
|
||||
with 369~neurons.
|
||||
|
||||
\glspl{MLP} training can be executed in various different ways, for example
|
||||
with \acrfull{SLP}. In case of a \gls{MLP} with the topology
|
||||
$160{:}500{:}500{:}500{:}369$, \gls{SLP} works as follows: At first a \gls{MLP}
|
||||
with one hidden layer ($160{:}500{:}369$) is trained. Then the output layer is
|
||||
discarded, a new hidden layer and a new output layer is added and it is trained
|
||||
again, resulting in a $160{:}500{:}500{:}369$ \gls{MLP}. The output layer is
|
||||
discarded again, a new hidden layer is added and a new output layer is added
|
||||
and the training is executed again.
|
||||
|
||||
Denoising auto-encoders are another way of pretraining. An
|
||||
\textit{auto-encoder} is a neural network that is trained to restore its input.
|
||||
This means the number of input neurons is equal to the number of output
|
||||
neurons. The weights define an \textit{encoding} of the input that allows
|
||||
restoring the input. As the neural network finds the encoding by itself, it is
|
||||
called auto-encoder. If the hidden layer is smaller than the input layer, it
|
||||
can be used for dimensionality reduction~\cite{Hinton1989}. If only one hidden
|
||||
layer with linear activation functions is used, then the hidden layer contains
|
||||
the principal components after training~\cite{Duda2001}.
|
||||
|
||||
Denoising auto-encoders are a variant introduced in~\cite{Vincent2008} that
|
||||
is more robust to partial corruption of the input features. It is trained to
|
||||
get robust by adding noise to the input features.
|
||||
|
||||
There are multiple ways how noise can be added. Gaussian noise and randomly
|
||||
masking elements with zero are two possibilities.
|
||||
\cite{Deeplearning-Denoising-AE} describes how such a denoising auto-encoder
|
||||
with masking noise can be implemented. The corruption $\varkappa \in [0, 1)$ is
|
||||
the probability of a feature being masked.
|
Loading…
Add table
Add a link
Reference in a new issue