mirror of
https://github.com/MartinThoma/LaTeX-examples.git
synced 2025-04-19 11:38:05 +02:00
113 lines
No EOL
6.4 KiB
TeX
113 lines
No EOL
6.4 KiB
TeX
%!TEX root = write-math-ba-paper.tex
|
|
|
|
\section{Algorithms}
|
|
\subsection{Preprocessing}\label{sec:preprocessing}
|
|
Preprocessing in symbol recognition is done to improve the quality and
|
|
expressive power of the data. It makes follow-up tasks like feature extraction
|
|
and classification easier, more effective or faster. It does so by resolving
|
|
errors in the input data, reducing duplicate information and removing
|
|
irrelevant information.
|
|
|
|
Preprocessing algorithms fall into two groups: Normalization and noise
|
|
reduction algorithms.
|
|
|
|
A very important normalization algorithm in single-symbol recognition is
|
|
\textit{scale-and-shift}~\cite{Thoma:2014}. It scales the recording so that
|
|
its bounding box fits into a unit square. As the aspect ratio of a recording is
|
|
almost never 1:1, only one dimension will fit exactly in the unit square. For
|
|
this paper, it was chosen to shift the recording in the direction of its bigger
|
|
dimension into the $[0,1] \times [0,1]$ unit square. After that, the recording
|
|
is shifted in direction of its smaller dimension such that its bounding box is
|
|
centered around zero.
|
|
|
|
Another normalization preprocessing algorithm is
|
|
resampling~\cite{Guyon91,Manke01}. As the data points on the pen trajectory are
|
|
generated asynchronously and with different time-resolutions depending on the
|
|
used hardware and software, it is desirable to resample the recordings to have
|
|
points spread equally in time for every recording. This was done by linear
|
|
interpolation of the $(x,t)$ and $(y,t)$ sequences and getting a fixed number
|
|
of equally spaced points per stroke.
|
|
|
|
\textit{Stroke connection} is a noise reduction algorithm which is mentioned
|
|
in~\cite{Tappert90}. It happens sometimes that the hardware detects that the
|
|
user lifted the pen where the user certainly didn't do so. This can be detected
|
|
by measuring the Euclidean distance between the end of one stroke and the
|
|
beginning of the next stroke. If this distance is below a threshold, then the
|
|
strokes are connected.
|
|
|
|
Due to a limited resolution of the recording device and due to erratic
|
|
handwriting, the pen trajectory might not be smooth. One way to smooth is
|
|
calculating a weighted average and replacing points by the weighted average of
|
|
their coordinate and their neighbors coordinates. Another way to do smoothing
|
|
is to reduce the number of points with the Douglas-Peucker
|
|
algorithm to the points that are more relevant for the
|
|
overall shape of a stroke and then interpolate the stroke between those points.
|
|
The Douglas-Peucker stroke simplification algorithm is usually used in
|
|
cartography to simplify the shape of roads. It works recursively to find a
|
|
subset of points of a stroke that is simpler and still similar to the original
|
|
shape. The algorithm adds the first and the last point $p_1$ and $p_n$ of a
|
|
stroke to the simplified set of points $S$. Then it searches the point $p_i$ in
|
|
between that has maximum distance from the line $p_1 p_n$. If this distance is
|
|
above a threshold $\varepsilon$, the point $p_i$ is added to $S$. Then the
|
|
algorithm gets applied to $p_1 p_i$ and $p_i p_n$ recursively. It is described
|
|
as \enquote{Algorithm 1} in~\cite{Visvalingam1990}.
|
|
|
|
\subsection{Features}\label{sec:features}
|
|
Features can be \textit{global}, that means calculated for the complete
|
|
recording or complete strokes. Other features are calculated for single points
|
|
on the pen trajectory and are called \textit{local}.
|
|
|
|
Global features are the \textit{number of strokes} in a recording, the
|
|
\textit{aspect ratio} of a recordings bounding box or the
|
|
\textit{ink} being used for a recording. The ink feature gets calculated by
|
|
measuring the length of all strokes combined. The re-curvature, which was
|
|
introduced in~\cite{Huang06}, is defined as
|
|
\[\text{re-curvature}(stroke) := \frac{\text{height}(stroke)}{\text{length}(stroke)}\]
|
|
and a stroke-global feature.
|
|
|
|
The simplest local feature is the coordinate of the point itself. Speed,
|
|
curvature and a local small-resolution bitmap around the point, which was
|
|
introduced by Manke, Finke and Waibel in~\cite{Manke1995}, are other local
|
|
features.
|
|
|
|
\subsection{Multilayer Perceptrons}\label{sec:mlp-training}
|
|
\Glspl{MLP} are explained in detail in~\cite{Mitchell97}. They can have
|
|
different numbers of hidden layers, the number of neurons per layer and the
|
|
activation functions can be varied. The learning algorithm is parameterized by
|
|
the learning rate $\eta \in (0, \infty)$, the momentum $\alpha \in [0, \infty)$
|
|
and the number of epochs.
|
|
|
|
The topology of \glspl{MLP} will be denoted in the following by separating the
|
|
number of neurons per layer with colons. For example, the notation
|
|
$160{:}500{:}500{:}500{:}369$ means that the input layer gets 160~features,
|
|
there are three hidden layers with 500~neurons per layer and one output layer
|
|
with 369~neurons.
|
|
|
|
\glspl{MLP} training can be executed in various different ways, for example
|
|
with \acrfull{SLP}. In case of a \gls{MLP} with the topology
|
|
$160{:}500{:}500{:}500{:}369$, \gls{SLP} works as follows: At first a \gls{MLP}
|
|
with one hidden layer ($160{:}500{:}369$) is trained. Then the output layer is
|
|
discarded, a new hidden layer and a new output layer is added and it is trained
|
|
again, resulting in a $160{:}500{:}500{:}369$ \gls{MLP}. The output layer is
|
|
discarded again, a new hidden layer is added and a new output layer is added
|
|
and the training is executed again.
|
|
|
|
Denoising auto-encoders are another way of pretraining. An
|
|
\textit{auto-encoder} is a neural network that is trained to restore its input.
|
|
This means the number of input neurons is equal to the number of output
|
|
neurons. The weights define an \textit{encoding} of the input that allows
|
|
restoring the input. As the neural network finds the encoding by itself, it is
|
|
called auto-encoder. If the hidden layer is smaller than the input layer, it
|
|
can be used for dimensionality reduction~\cite{Hinton1989}. If only one hidden
|
|
layer with linear activation functions is used, then the hidden layer contains
|
|
the principal components after training~\cite{Duda2001}.
|
|
|
|
Denoising auto-encoders are a variant introduced in~\cite{Vincent2008} that
|
|
is more robust to partial corruption of the input features. It is trained to
|
|
get robust by adding noise to the input features.
|
|
|
|
There are multiple ways how noise can be added. Gaussian noise and randomly
|
|
masking elements with zero are two possibilities.
|
|
\cite{Deeplearning-Denoising-AE} describes how such a denoising auto-encoder
|
|
with masking noise can be implemented. The corruption $\varkappa \in [0, 1)$ is
|
|
the probability of a feature being masked. |