Add papers/write-math-paper

2025-04-26 06:48:04 +02:00 · 2015-10-14 14:46:02 +02:00 · 2015-10-14 14:46:02 +02:00 · fe78311901
commit fe78311901
parent 7740f0147f
25 changed files with 10624 additions and 0 deletions
--- a/documents/papers/write-math-paper/ch6-summary.tex
+++ b/documents/papers/write-math-paper/ch6-summary.tex
@ -0,0 +1,123 @@
+%!TEX root = write-math-ba-paper.tex
+
+\section{Summary}
+Four baseline recognition systems were adjusted in many experiments and their
+recognition capabilities were compared in order to build a recognition system
+that can recognize 396 mathematical symbols with low error rates as well as to
+evaluate which preprocessing steps and features help to improve the recognition
+rate.
+
+All recognition systems were trained and evaluated with
+$\num{\totalCollectedRecordings{}}$ recordings for \totalClassesAnalyzed{}
+symbols. These recordings were collected by two crowdsourcing projects
+(\href{http://detexify.kirelabs.org/classify.html}{Detexify} and
+\href{write-math.com}{write-math.com}) and created with various devices. While
+some recordings were created with standard touch devices such as tablets and
+smartphones, others were created with the mouse.
+
+\Glspl{MLP} were used for the classification task. Four baseline systems with
+different numbers of hidden layers were used, as the number of hidden layer
+influences the capabilities and problems of \glspl{MLP}.
+
+All baseline systems used the same preprocessing queue. The recordings were
+scaled and shifted as described in \ref{sec:preprocessing}, resampled with
+linear interpolation so that every stroke had exactly 20~points which are
+spread equidistant in time. The 80~($x,y$) coordinates of the first 4~strokes
+were used to get exactly $160$ input features for every recording. The baseline
+system $B_{hl=2}$ has a top-3 error of $\SI{5.7}{\percent}$.
+
+Adding two slightly rotated variants for each recording and hence tripling the
+training set made the systems $B_{hl=3}$ and $B_{hl=4}$ perform much worse, but
+improved the performance of the smaller systems.
+
+The global features re-curvature, ink, stoke count and aspect ratio improved
+the systems $B_{hl=1}$--$B_{hl=3}$, whereas the stroke center point feature
+made $B_{hl=2}$ perform worse.
+
+Denoising auto-encoders were evaluated as one way to use pretraining, but by
+this the error rate increased notably. However, \acrlong{SLP} improved the
+performance decidedly.
+
+The stroke connection algorithm was added to the preprocessing steps of the
+baseline system as well as the re-curvature feature, the ink feature, the
+number of strokes and the aspect ratio. The training setup of the baseline
+system was changed to \acrlong{SLP} and the resulting model was trained with a
+lower learning rate again. This optimized recognizer $B_{hl=2,c}'$ had a top-3
+error of $\SI{4.0}{\percent}$. This means that the top-3 error dropped by over
+$\num{1.7}$ percentage points in comparison to the baseline system $B_{hl=2}$.
+
+A top-3 error of $\SI{4.0}{\percent}$ makes the system usable for symbol
+lookup. It could also be used as a starting point for the development of a
+multiple-symbol classifier.
+
+The aim of this work was to develop a symbol recognition system which is easy
+to use, fast and has high recognition rates as well as evaluating ideas for
+single symbol classifiers. Some of those goals were reached. The recognition
+system $B_{hl=2,c}'$ evaluates new recordings in a fraction of a second and has
+acceptable recognition rates.
+
+% Many algorithms were evaluated. However, there are still many other
+% algorithms which could be evaluated and, at the time of this work, the best
+% classifier $B_{hl=2,c}'$ is only available through the Python package
+% \texttt{hwrt}. It is planned to add an web version of that classifier online.
+
+\section{Optimized Recognizer}
+All preprocessing steps and features that were useful were combined to create a
+recognizer that performs best.
+
+All models were much better than everything that was tried before. The results
+of this experiment show that single-symbol recognition with
+\totalClassesAnalyzed{} classes and usual touch devices and the mouse can be
+done with a top-1 error rate of $\SI{18.6}{\percent}$ and a top-3 error of
+$\SI{4.1}{\percent}$. This was
+achieved by a \gls{MLP} with a $167{:}500{:}500{:}\totalClassesAnalyzed{}$ topology.
+
+It used the stroke connection algorithm to connect of which the ends were less
+than $\SI{10}{\pixel}$ away, scaled each recording to a unit square and shifted
+as described in \ref{sec:preprocessing}. After that, a linear resampling step
+was applied to the first 4 strokes to resample them to 20 points each. All
+other strokes were discarded.
+
+\goodbreak
+The 167 features were\mynobreakpar%
+\begin{itemize}
+     \item the first 4 strokes with 20 points per stroke resulting in 160
+           features,
+     \item the re-curvature for the first 4 strokes,
+     \item the ink,
+     \item the number of strokes and
+     \item the aspect ratio of the bounding box
+\end{itemize}
+
+\Gls{SLP} was applied with $\num{1000}$ epochs per layer, a
+learning rate of $\eta=0.1$ and a momentum of $\alpha=0.1$. After that, the
+complete model was trained again for $1000$ epochs with standard mini-batch
+gradient descent resulting in systems $B_{hl=1,c}'$ -- $B_{hl=4,c}'$.
+
+After the models $B_{hl=1,c}$ -- $B_{hl=4,c}$ were trained the first $1000$ epochs,
+they were trained again for $\num{1000}$ epochs with a learning rate of $\eta =
+0.05$. \Cref{table:complex-recognizer-systems-evaluation} shows that
+this improved the classifiers again.
+
+\begin{table}[htb]
+    \centering
+    \begin{tabular}{lrrrr}
+    \toprule
+    \multirow{2}{*}{System}  & \multicolumn{4}{c}{Classification error}\\
+    \cmidrule(l){2-5}
+              & Top-1                 & Change                & Top-3                & Change\\\midrule
+    $B_{hl=1,c}$ & $\SI{21.0}{\percent}$ & $\SI{-2.2}{\percent}$ & $\SI{5.2}{\percent}$ & $\SI{-1.5}{\percent}$\\
+    $B_{hl=2,c}$ & $\SI{18.3}{\percent}$ & $\SI{-3.3}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
+    $B_{hl=3,c}$ & \underline{$\SI{18.2}{\percent}$} & $\SI{-3.7}{\percent}$ & \underline{$\SI{4.1}{\percent}$} & $\SI{-1.6}{\percent}$\\
+    $B_{hl=4,c}$ & $\SI{18.6}{\percent}$ & $\SI{-5.3}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\\midrule
+    $B_{hl=1,c}'$ & $\SI{19.3}{\percent}$ & $\SI{-3.9}{\percent}$ & $\SI{4.8}{\percent}$ & $\SI{-1.9}{\percent}$ \\
+    $B_{hl=2,c}'$ & \underline{$\SI{17.5}{\percent}$} & $\SI{-4.1}{\percent}$ & \underline{$\SI{4.0}{\percent}$} & $\SI{-1.7}{\percent}$\\
+    $B_{hl=3,c}'$ & $\SI{17.7}{\percent}$ & $\SI{-4.2}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
+    $B_{hl=4,c}'$ & $\SI{17.8}{\percent}$ & $\SI{-6.1}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\
+    \bottomrule
+    \end{tabular}
+    \caption{Error rates of the optimized recognizer systems. The systems
+             $B_{hl=i,c}'$ were trained another $\num{1000}$ epochs with a learning rate
+             of $\eta=0.05$.}
+\label{table:complex-recognizer-systems-evaluation}
+\end{table}