2
0
Fork 0
mirror of https://github.com/MartinThoma/LaTeX-examples.git synced 2025-04-26 06:48:04 +02:00
LaTeX-examples/documents/papers/write-math-paper/ch1-introduction.tex
2015-10-14 14:46:02 +02:00

46 lines
No EOL
2.7 KiB
TeX

%!TEX root = write-math-ba-paper.tex
\section{Introduction}
On-line recognition makes use of the pen trajectory. One possible
representation of the data is given as groups of sequences of tuples $(x, y, t)
\in \mathbb{R}^3$, where each group represents a stroke, $(x, y)$ is the
position of the pen on a canvas and $t$ is the time.
% On-line data was used to classify handwritten natural language text in many
% different variants. For example, the $\text{NPen}^{++}$ system classified
% cursive handwriting into English words by using hidden Markov models and neural
% networks~\cite{Manke1995}.
% Several systems for mathematical symbol recognition with on-line data have been
% described so far~\cite{Kosmala98,Mouchere2013}, but no standard test set
% existed to compare the results of different classifiers for single-symbol
% classification of mathematical symbols. The used symbols differed in most
% papers. This is unfortunate as the choice of symbols is crucial for the top-$n$
% error. For example, the symbols $o$, $O$, $\circ$ and $0$ are very similar and
% systems which know all those classes will certainly have a higher top-$n$ error
% than systems which only accept one of them. But not only the classes differed,
% also the used data to train and test had to be collected by each author again.
\cite{Kirsch}~describes a system called Detexify which uses
time warping to classify on-line handwritten symbols and reports a top-3 error
of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols. He did also
recently publish his data on \url{https://github.com/kirel/detexify-data},
which was collected by a crowdsourcing approach via
\url{http://detexify.kirelabs.org}. Those recordings as well as some recordings
which were collected by a similar approach via \url{http://write-math.com} were
merged in a single data set, the labels were semi-automatically checked for
correctness and used to train and evaluated different classifiers. A more
detailed description of all used software, data and experiments is given
in~\cite{Thoma:2014}.
In this paper we present a baseline system for the classification of on-line
handwriting into $369$ classes of which some are very similar. An optimized
classifier was developed which has a $\SI{29.7}{\percent}$ relative improvement
of the top-3 error. This was achieved by using better features and \gls{SLP}.
The absolute improvements compared to the baseline of those changes will also
be shown.
In the following, we will give a general overview of the system design, give
information about the used data and implementation, describe the algorithms
we used to classify the data, report results of our experiments and present
the optimized recognizer we created.