mirror of
https://github.com/MartinThoma/LaTeX-examples.git
synced 2025-04-19 11:38:05 +02:00
123 lines
6.6 KiB
TeX
123 lines
6.6 KiB
TeX
%!TEX root = write-math-ba-paper.tex
|
|
|
|
\section{Summary}
|
|
Four baseline recognition systems were adjusted in many experiments and their
|
|
recognition capabilities were compared in order to build a recognition system
|
|
that can recognize 396 mathematical symbols with low error rates as well as to
|
|
evaluate which preprocessing steps and features help to improve the recognition
|
|
rate.
|
|
|
|
All recognition systems were trained and evaluated with
|
|
$\num{\totalCollectedRecordings{}}$ recordings for \totalClassesAnalyzed{}
|
|
symbols. These recordings were collected by two crowdsourcing projects
|
|
(\href{http://detexify.kirelabs.org/classify.html}{Detexify} and
|
|
\href{write-math.com}{write-math.com}) and created with various devices. While
|
|
some recordings were created with standard touch devices such as tablets and
|
|
smartphones, others were created with the mouse.
|
|
|
|
\Glspl{MLP} were used for the classification task. Four baseline systems with
|
|
different numbers of hidden layers were used, as the number of hidden layer
|
|
influences the capabilities and problems of \glspl{MLP}.
|
|
|
|
All baseline systems used the same preprocessing queue. The recordings were
|
|
scaled and shifted as described in \ref{sec:preprocessing}, resampled with
|
|
linear interpolation so that every stroke had exactly 20~points which are
|
|
spread equidistant in time. The 80~($x,y$) coordinates of the first 4~strokes
|
|
were used to get exactly $160$ input features for every recording. The baseline
|
|
system $B_{hl=2}$ has a top-3 error of $\SI{5.7}{\percent}$.
|
|
|
|
Adding two slightly rotated variants for each recording and hence tripling the
|
|
training set made the systems $B_{hl=3}$ and $B_{hl=4}$ perform much worse, but
|
|
improved the performance of the smaller systems.
|
|
|
|
The global features re-curvature, ink, stoke count and aspect ratio improved
|
|
the systems $B_{hl=1}$--$B_{hl=3}$, whereas the stroke center point feature
|
|
made $B_{hl=2}$ perform worse.
|
|
|
|
Denoising auto-encoders were evaluated as one way to use pretraining, but by
|
|
this the error rate increased notably. However, \acrlong{SLP} improved the
|
|
performance decidedly.
|
|
|
|
The stroke connection algorithm was added to the preprocessing steps of the
|
|
baseline system as well as the re-curvature feature, the ink feature, the
|
|
number of strokes and the aspect ratio. The training setup of the baseline
|
|
system was changed to \acrlong{SLP} and the resulting model was trained with a
|
|
lower learning rate again. This optimized recognizer $B_{hl=2,c}'$ had a top-3
|
|
error of $\SI{4.0}{\percent}$. This means that the top-3 error dropped by over
|
|
$\num{1.7}$ percentage points in comparison to the baseline system $B_{hl=2}$.
|
|
|
|
A top-3 error of $\SI{4.0}{\percent}$ makes the system usable for symbol
|
|
lookup. It could also be used as a starting point for the development of a
|
|
multiple-symbol classifier.
|
|
|
|
The aim of this work was to develop a symbol recognition system which is easy
|
|
to use, fast and has high recognition rates as well as evaluating ideas for
|
|
single symbol classifiers. Some of those goals were reached. The recognition
|
|
system $B_{hl=2,c}'$ evaluates new recordings in a fraction of a second and has
|
|
acceptable recognition rates.
|
|
|
|
% Many algorithms were evaluated. However, there are still many other
|
|
% algorithms which could be evaluated and, at the time of this work, the best
|
|
% classifier $B_{hl=2,c}'$ is only available through the Python package
|
|
% \texttt{hwrt}. It is planned to add an web version of that classifier online.
|
|
|
|
\section{Optimized Recognizer}
|
|
All preprocessing steps and features that were useful were combined to create a
|
|
recognizer that performs best.
|
|
|
|
All models were much better than everything that was tried before. The results
|
|
of this experiment show that single-symbol recognition with
|
|
\totalClassesAnalyzed{} classes and usual touch devices and the mouse can be
|
|
done with a top-1 error rate of $\SI{18.6}{\percent}$ and a top-3 error of
|
|
$\SI{4.1}{\percent}$. This was
|
|
achieved by a \gls{MLP} with a $167{:}500{:}500{:}\totalClassesAnalyzed{}$ topology.
|
|
|
|
It used the stroke connection algorithm to connect of which the ends were less
|
|
than $\SI{10}{\pixel}$ away, scaled each recording to a unit square and shifted
|
|
as described in \ref{sec:preprocessing}. After that, a linear resampling step
|
|
was applied to the first 4 strokes to resample them to 20 points each. All
|
|
other strokes were discarded.
|
|
|
|
\goodbreak
|
|
The 167 features were\mynobreakpar%
|
|
\begin{itemize}
|
|
\item the first 4 strokes with 20 points per stroke resulting in 160
|
|
features,
|
|
\item the re-curvature for the first 4 strokes,
|
|
\item the ink,
|
|
\item the number of strokes and
|
|
\item the aspect ratio of the bounding box
|
|
\end{itemize}
|
|
|
|
\Gls{SLP} was applied with $\num{1000}$ epochs per layer, a
|
|
learning rate of $\eta=0.1$ and a momentum of $\alpha=0.1$. After that, the
|
|
complete model was trained again for $1000$ epochs with standard mini-batch
|
|
gradient descent resulting in systems $B_{hl=1,c}'$ -- $B_{hl=4,c}'$.
|
|
|
|
After the models $B_{hl=1,c}$ -- $B_{hl=4,c}$ were trained the first $1000$ epochs,
|
|
they were trained again for $\num{1000}$ epochs with a learning rate of $\eta =
|
|
0.05$. \Cref{table:complex-recognizer-systems-evaluation} shows that
|
|
this improved the classifiers again.
|
|
|
|
\begin{table}[htb]
|
|
\centering
|
|
\begin{tabular}{lrrrr}
|
|
\toprule
|
|
\multirow{2}{*}{System} & \multicolumn{4}{c}{Classification error}\\
|
|
\cmidrule(l){2-5}
|
|
& Top-1 & Change & Top-3 & Change\\\midrule
|
|
$B_{hl=1,c}$ & $\SI{21.0}{\percent}$ & $\SI{-2.2}{\percent}$ & $\SI{5.2}{\percent}$ & $\SI{-1.5}{\percent}$\\
|
|
$B_{hl=2,c}$ & $\SI{18.3}{\percent}$ & $\SI{-3.3}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
|
|
$B_{hl=3,c}$ & \underline{$\SI{18.2}{\percent}$} & $\SI{-3.7}{\percent}$ & \underline{$\SI{4.1}{\percent}$} & $\SI{-1.6}{\percent}$\\
|
|
$B_{hl=4,c}$ & $\SI{18.6}{\percent}$ & $\SI{-5.3}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\\midrule
|
|
$B_{hl=1,c}'$ & $\SI{19.3}{\percent}$ & $\SI{-3.9}{\percent}$ & $\SI{4.8}{\percent}$ & $\SI{-1.9}{\percent}$ \\
|
|
$B_{hl=2,c}'$ & \underline{$\SI{17.5}{\percent}$} & $\SI{-4.1}{\percent}$ & \underline{$\SI{4.0}{\percent}$} & $\SI{-1.7}{\percent}$\\
|
|
$B_{hl=3,c}'$ & $\SI{17.7}{\percent}$ & $\SI{-4.2}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
|
|
$B_{hl=4,c}'$ & $\SI{17.8}{\percent}$ & $\SI{-6.1}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Error rates of the optimized recognizer systems. The systems
|
|
$B_{hl=i,c}'$ were trained another $\num{1000}$ epochs with a learning rate
|
|
of $\eta=0.05$.}
|
|
\label{table:complex-recognizer-systems-evaluation}
|
|
\end{table}
|