mirror of
https://github.com/MartinThoma/LaTeX-examples.git
synced 2025-04-19 11:38:05 +02:00
Rewrote introduction after feedback from Prof. Waibel
This commit is contained in:
parent
657eae88c0
commit
4804ad91d5
2 changed files with 68 additions and 34 deletions
Binary file not shown.
|
@ -8,6 +8,7 @@
|
|||
\usepackage{booktabs}
|
||||
\usepackage{multirow}
|
||||
\usepackage{pgfplots}
|
||||
\usepackage{ wasysym }
|
||||
\usepackage[noadjust]{cite}
|
||||
\usepackage[nameinlink,noabbrev]{cleveref} % has to be after hyperref, ntheorem, amsthm
|
||||
\usepackage[binary-units]{siunitx}
|
||||
|
@ -33,59 +34,77 @@
|
|||
\begin{document}
|
||||
\maketitle
|
||||
\begin{abstract}
|
||||
Writing mathematical formulas with \LaTeX{} is easy as soon as one is used to
|
||||
commands like \verb+\alpha+ and \verb+\propto+. However, for people who have
|
||||
never used \LaTeX{} or who don't know the English name of the command, it can
|
||||
be difficult to find the right command. Hence the automatic recognition of
|
||||
handwritten mathematical symbols is desirable. This paper presents a system
|
||||
|
||||
The automatic recognition of single handwritten symbols has three main
|
||||
applications. The first application is to support users who know how a symbol
|
||||
looks like, but not what its name is such as $\saturn$. The second application
|
||||
is providing the necessary commands for professional publishing in books or on
|
||||
websites, e.g. in form of \LaTeX{} commands, as MathML, or as code points. The
|
||||
third application of single symbol classifiers is in form of a building block
|
||||
for formula recognition.
|
||||
|
||||
This paper presents a system
|
||||
which uses the pen trajectory to classify handwritten symbols. Five
|
||||
preprocessing steps, one data multiplication algorithm, five features and five
|
||||
variants for multilayer Perceptron training were evaluated using $\num{166898}$
|
||||
recordings which were collected with two crowdsourcing projects. The evaluation
|
||||
results of these 21~experiments were used to create an optimized recognizer
|
||||
which has a TOP-1 error of less than $\SI{17.5}{\percent}$ and a TOP-3 error of
|
||||
$\SI{4.0}{\percent}$. This is a relative improvement of $\SI{18.5}{\percent}$ for the
|
||||
TOP-1 error and $\SI{29.7}{\percent}$ for the TOP-3 error compared to the
|
||||
baseline system.
|
||||
$\SI{4.0}{\percent}$. This is a relative improvement of $\SI{18.5}{\percent}$
|
||||
for the TOP-1 error and $\SI{29.7}{\percent}$ for the TOP-3 error compared to
|
||||
the baseline system.
|
||||
\end{abstract}
|
||||
|
||||
\section{Introduction}
|
||||
On-line recognition makes use of the pen trajectory. This means the data is
|
||||
given as groups of sequences of tuples $(x, y, t) \in \mathbb{R}^3$, where each
|
||||
group represents a stroke, $(x, y)$ is the position of the pen on a canvas and
|
||||
$t$ is the time. One handwritten symbol in the described format is called a
|
||||
\textit{recording}. One approach to classify recordings into symbol classes
|
||||
assigns a probability to each class given the data. The classifier can be
|
||||
evaluated by using recordings which were classified by humans and were not used
|
||||
to train the classifier. The set of those recordings is called \textit{test
|
||||
set}. The TOP-$n$ error is defined as the fraction of the symbols where
|
||||
the correct class was not within the top $n$ classes of the highest
|
||||
probability.
|
||||
$t$ is the time.
|
||||
|
||||
On-line data was used to classify handwritten natural language text in many
|
||||
different variants. For example, the NPen++ system classified cursive
|
||||
handwriting into English words by using hidden Markov models and neural
|
||||
networks\cite{Manke1995}.
|
||||
|
||||
% One handwritten symbol in the described format is called a
|
||||
% \textit{recording}. One approach to classify recordings into symbol classes
|
||||
% assigns a probability to each class given the data. The classifier can be
|
||||
% evaluated by using recordings which were classified by humans and were not used
|
||||
% to train the classifier. The set of those recordings is called \textit{test
|
||||
% set}. The TOP-$n$ error is defined as the fraction of the symbols where
|
||||
% the correct class was not within the top $n$ classes of the highest
|
||||
% probability.
|
||||
|
||||
Several systems for mathematical symbol recognition with on-line data have been
|
||||
described so far~\cite{Kosmala98,Mouchere2013}, but most of them have neither
|
||||
published their source code nor their data which makes it impossible to re-run
|
||||
experiments to compare different systems. This is unfortunate as the choice of
|
||||
symbols is crucial for the TOP-$n$ error and all systems used different symbol
|
||||
sets. For example, the symbols $o$, $O$, $\circ$ and $0$ are very similar and
|
||||
systems which know all those classes will certainly have a higher TOP-$n$ error
|
||||
than systems which only accept one of them.
|
||||
described so far~\cite{Kosmala98,Mouchere2013}, but no standard test set
|
||||
existed to compare the results of different classifiers. The used symbols
|
||||
differed in all papers. This is unfortunate as the choice of symbols is crucial
|
||||
for the TOP-$n$ error. For example, the symbols $o$, $O$, $\circ$ and $0$ are
|
||||
very similar and systems which know all those classes will certainly have a
|
||||
higher TOP-$n$ error than systems which only accept one of them. But not only
|
||||
the classes differed, also the used data to train and test had to be collected
|
||||
by each author again.
|
||||
|
||||
Daniel Kirsch describes in~\cite{Kirsch} a system called Detexify which uses
|
||||
time warping to classify on-line handwritten symbols and reports a
|
||||
TOP-3 error of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols.
|
||||
He also published his data on \url{https://github.com/kirel/detexify-data},
|
||||
time warping to classify on-line handwritten symbols and reports a TOP-3 error
|
||||
of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols. He did also
|
||||
recently publish his data on \url{https://github.com/kirel/detexify-data},
|
||||
which was collected by a crowdsourcing approach via
|
||||
\url{http://detexify.kirelabs.org}. Those recordings as well as some recordings
|
||||
which were collected by a similar approach via \url{http://write-math.com} were
|
||||
used to train and evaluated different classifiers. A complete description of
|
||||
all involved software, data and experiments is given in~\cite{Thoma:2014}.
|
||||
|
||||
In this paper we present a baseline system for the classification of on-line
|
||||
handwriting into $369$ classes of which some are very similar. An optimized
|
||||
classifier which has a $\SI{29.7}{\percent}$ relative improvement of the TOP-3
|
||||
error. This was achieved by using better features and layer-wise supervised
|
||||
pretraining. The absolute improvements compared to the baseline of those
|
||||
changes will also be shown.
|
||||
|
||||
|
||||
\section{Steps in Handwriting Recognition}
|
||||
|
||||
The following steps are used for symbol classification:
|
||||
|
||||
The following steps are used for symbol classification:\nobreak
|
||||
\begin{enumerate}
|
||||
\item \textbf{Preprocessing}: Recorded data is never perfect. Devices have
|
||||
errors and people make mistakes while using the devices. To tackle
|
||||
|
@ -108,8 +127,9 @@ The following steps are used for symbol classification:
|
|||
recognition, this step will not be further discussed.
|
||||
\item \textbf{Feature computation}: A feature is high-level information
|
||||
derived from the raw data after preprocessing. Some systems like
|
||||
Detexify take the result of the preprocessing step, but many
|
||||
compute new features. This might have the advantage that less
|
||||
Detexify take the result of the preprocessing step, but many compute
|
||||
new features. Those features could be designed by a human engineer or
|
||||
learned. Non-raw data features can have the advantage that less
|
||||
training data is needed since the developer can use knowledge about
|
||||
handwriting to compute highly discriminative features. Various
|
||||
features are explained in \cref{sec:features}.
|
||||
|
@ -121,8 +141,7 @@ The following steps are used for symbol classification:
|
|||
After these steps, we are faced with a classification learning task which
|
||||
consists of two parts:
|
||||
\begin{enumerate}
|
||||
\item \textbf{Learning} parameters for a given classifier. This process is
|
||||
also called \textit{training}.
|
||||
\item \textbf{Learning} parameters for a given classifier.
|
||||
\item \textbf{Classifying} new recordings, sometimes called
|
||||
\textit{evaluation}. This should not be confused with the evaluation
|
||||
of the classification performance which is done for multiple
|
||||
|
@ -135,6 +154,21 @@ of input features is the same for every recording. There are many ways how to
|
|||
adjust \glspl{MLP} and how to adjust their training. Some of them are
|
||||
described in~\cref{sec:mlp-training}.
|
||||
|
||||
|
||||
\section{Data and Implementation}
|
||||
The combined data of Detexify and \href{http://write-math.com}{write-math.com}
|
||||
can be downloaded via \href{http://write-math.com/data}{write-math.com/data} as
|
||||
a compressed tar archive. It contains a list of $369$ symbols which are used in
|
||||
mathematical context. Each symbol has at least $50$ labeled examples, but most
|
||||
symbols have more than $200$ labeled examples and some have more than $2000$.
|
||||
In total, more than $\num{160000}$ labeled recordings were collected.
|
||||
|
||||
Preprocessing and feature computation algorithms were implemented and are
|
||||
publicly available as open-source software in the Python package \texttt{hwrt}
|
||||
and \gls{MLP} algorithms are available in the Python package
|
||||
\texttt{nntoolkit}.
|
||||
|
||||
|
||||
\section{Algorithms}
|
||||
\subsection{Preprocessing}\label{sec:preprocessing}
|
||||
Preprocessing in symbol recognition is done to improve the quality and
|
||||
|
@ -485,7 +519,7 @@ this improved the classifiers again.
|
|||
\end{table}
|
||||
|
||||
|
||||
\section{Conclusion}
|
||||
\section{Discussion}
|
||||
Four baseline recognition systems were adjusted in many experiments and their
|
||||
recognition capabilities were compared in order to build a recognition system
|
||||
that can recognize 396 mathematical symbols with low error rates as well as to
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue