mirror of
https://github.com/MartinThoma/LaTeX-examples.git
synced 2025-04-19 11:38:05 +02:00
documents/write-math-ba-paper: Fixed some spelling mistakes
This commit is contained in:
parent
2f03172b3a
commit
c9def13de2
3 changed files with 30 additions and 26 deletions
|
@ -1,3 +1,6 @@
|
|||
[Download compiled PDF](https://github.com/MartinThoma/LaTeX-examples/blob/master/documents/write-math-ba-paper/write-math-ba-paper.pdf)
|
||||
|
||||
## Spell checking
|
||||
* Spell checking `aspell --lang=en --mode=tex check write-math-ba-paper.tex`
|
||||
* Spell checking with `http://www.reverso.net/spell-checker`
|
||||
* Spell checking with `http://www.reverso.net/spell-checker`
|
||||
* https://github.com/devd/Academic-Writing-Check
|
Binary file not shown.
|
@ -75,22 +75,21 @@ Daniel Kirsch describes in~\cite{Kirsch} a system called Detexify which uses
|
|||
time warping to classify on-line handwritten symbols and claims to achieve a
|
||||
TOP-3 error of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols.
|
||||
He also published his data on \url{https://github.com/kirel/detexify-data},
|
||||
which was collected by a crowd-sourcing approach via
|
||||
which was collected by a crowdsourcing approach via
|
||||
\url{http://detexify.kirelabs.org}. Those recordings as well as some recordings
|
||||
which were collected by a similar approach via \url{http://write-math.com} were
|
||||
used to train and evaluated different classifiers. A complete description of
|
||||
all involved software, data and experiments is given in~\cite{Thoma:2014}.
|
||||
|
||||
\section{Steps in Handwriting Recognition}
|
||||
The following steps are used in all classifiers which are described in the
|
||||
following:
|
||||
The following steps are used in many classifiers:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \textbf{Preprocessing}: Recorded data is never perfect. Devices have
|
||||
errors and people make mistakes while using devices. To tackle these
|
||||
problems there are preprocessing algorithms to clean the data. The
|
||||
preprocessing algorithms can also remove unnecessary variations of
|
||||
the data that do not help in the classification process, but hide
|
||||
errors and people make mistakes while using the devices. To tackle
|
||||
these problems there are preprocessing algorithms to clean the data.
|
||||
The preprocessing algorithms can also remove unnecessary variations
|
||||
of the data that do not help in the classification process, but hide
|
||||
what is important. Having slightly different sizes of the same symbol
|
||||
is an example of such a variation. Four preprocessing algorithms that
|
||||
clean or normalize recordings are explained in
|
||||
|
@ -117,15 +116,16 @@ following:
|
|||
improve the performance of learning algorithms.
|
||||
\end{enumerate}
|
||||
|
||||
After these steps, we are faced with a classification learning task which consists of
|
||||
two parts:
|
||||
After these steps, we are faced with a classification learning task which
|
||||
consists of two parts:
|
||||
\begin{enumerate}
|
||||
\item \textbf{Learning} parameters for a given classifier. This process is
|
||||
also called \textit{training}.
|
||||
\item \textbf{Classifying} new recordings, sometimes called
|
||||
\textit{evaluation}. This should not be confused with the evaluation
|
||||
of the classification performance which is done for multiple
|
||||
topologies, preprocessing queues, and features in \Cref{ch:Evaluation}.
|
||||
topologies, preprocessing queues, and features in
|
||||
\Cref{ch:Evaluation}.
|
||||
\end{enumerate}
|
||||
|
||||
The classification learning task can be solved with \glspl{MLP} if the number
|
||||
|
@ -141,7 +141,7 @@ and feature extraction easier, more effective or faster. It does so by resolving
|
|||
errors in the input data, reducing duplicate information and removing irrelevant
|
||||
information.
|
||||
|
||||
Preprocessing algorithms fall in two groups: Normalization and noise
|
||||
Preprocessing algorithms fall into two groups: Normalization and noise
|
||||
reduction algorithms.
|
||||
|
||||
A very important normalization algorithm in single-symbol recognition is
|
||||
|
@ -157,12 +157,12 @@ Another normalization preprocessing algorithm is resampling. As the data points
|
|||
on the pen trajectory are generated asynchronously and with different
|
||||
time-resolutions depending on the used hardware and software, it is desirable
|
||||
to resample the recordings to have points spread equally in time for every
|
||||
recording. This was done with linear interpolation of the $(x,t)$ and $(y,t)$
|
||||
recording. This was done by linear interpolation of the $(x,t)$ and $(y,t)$
|
||||
sequences and getting a fixed number of equally spaced points per stroke.
|
||||
|
||||
\textit{Connect strokes} is a noise reduction algorithm. It happens sometimes
|
||||
that the hardware detects that the user lifted the pen where the user certainly
|
||||
didn't do so. This can be detected by measuring the euclidean distance between
|
||||
didn't do so. This can be detected by measuring the Euclidean distance between
|
||||
the end of one stroke and the beginning of the next stroke. If this distance is
|
||||
below a threshold, then the strokes are connected.
|
||||
|
||||
|
@ -207,19 +207,20 @@ activation functions can be varied. The learning algorithm is parameterized by
|
|||
the learning rate $\eta \in (0, \infty)$, the momentum $\alpha \in [0, \infty)$
|
||||
and the number of epochs.
|
||||
|
||||
The topology of \glspl{MLP} will be denoted in the following by separating
|
||||
the number of neurons per layer with colons. For example, the notation $160{:}500{:}500{:}500{:}369$
|
||||
means that the input layer gets 160~features, there are three hidden layers
|
||||
with 500~neurons per layer and one output layer with 369~neurons.
|
||||
The topology of \glspl{MLP} will be denoted in the following by separating the
|
||||
number of neurons per layer with colons. For example, the notation
|
||||
$160{:}500{:}500{:}500{:}369$ means that the input layer gets 160~features,
|
||||
there are three hidden layers with 500~neurons per layer and one output layer
|
||||
with 369~neurons.
|
||||
|
||||
\glspl{MLP} training can be executed in
|
||||
various different ways, for example with \gls{SLP}.
|
||||
In case of a \gls{MLP} with the topology $160{:}500{:}500{:}500{:}369$,
|
||||
\gls{SLP} works as follows: At first a \gls{MLP} with one hidden layer ($160{:}500{:}369$)
|
||||
is trained. Then the output layer is discarded, a new hidden layer and a new
|
||||
output layer is added and it is trained again, resulting in a $160{:}500{:}500{:}369$
|
||||
\gls{MLP}. The output layer is discarded again, a new hidden layer is added and
|
||||
a new output layer is added and the training is executed again.
|
||||
\glspl{MLP} training can be executed in various different ways, for example
|
||||
with \gls{SLP}. In case of a \gls{MLP} with the topology
|
||||
$160{:}500{:}500{:}500{:}369$, \gls{SLP} works as follows: At first a \gls{MLP}
|
||||
with one hidden layer ($160{:}500{:}369$) is trained. Then the output layer is
|
||||
discarded, a new hidden layer and a new output layer is added and it is trained
|
||||
again, resulting in a $160{:}500{:}500{:}369$ \gls{MLP}. The output layer is
|
||||
discarded again, a new hidden layer is added and a new output layer is added
|
||||
and the training is executed again.
|
||||
|
||||
Denoising auto-encoders are another way of pretraining. An
|
||||
\textit{auto-encoder} is a neural network that is trained to restore its input.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue