conversation. One important goal is to make the conversation seem as natural as possible.
Ideally, an interaction with the bot should be indistinguishable from one with a human. This
can make communication with a computer become very pleasant and easy for humans as
they are simply using their natural language.
\\Conversational AI can be used in Voice Assistants that communicate through spoken words or
they are simply using their natural language.\par
Conversational AI can be used in Voice Assistants that communicate through spoken words or
through chatbots that imitate a human by sending text messages.
\subsection{Rasa Framework}
Rasa is a collection of tools for conversational AI software. The \textit{Rasa Stack} consists
of two open source libraries called \textit{Rasa NLU} and \textit{Rasa Core} that can be used to create contextual
\\A Rasa Bot needs training data to work properly.
A Rasa Bot needs training data to work properly.
\subsection{Research Question}
The objective of this project is to find out, whether chatbots can be trained with natural
@@ -95,7 +99,7 @@
it is fragmented into somewhat arbitrary subcategories and thus not optimal to use as a
However Wikipedia also has a \textit{List of physicists}\footnote{\url{}} which contains 981 physicists and was
used to build the collection used. \\
used to build the collection used. \par
Data scraping was done using the R Package \textit{WikipediR}, a wrapper around the Wikipedia
Articles were downloaded as HTML\footnote{HTML was chosen over wikitext to ease text cleaning}
@@ -106,7 +110,7 @@
As all approaches leverage on some form of NER or POS tagging, annotations were created for all
This was done using the R Package \textit{cleanNLP} with a spaCy backend to create NER and POS
tags, as well as lemmatization. \\
tags, as well as lemmatization. \par
Fact extraction for physicists spouses was done using pre-defined patterns on word
lemmata.\footnote{Functionality to use patterns on POS tags is also available but did not yield
a better outcome.}
@@ -114,7 +118,7 @@
places to look for the name of the physicist and his/her spouse.
When a matching phrase is found the results are verified by checking that the correct
physicist is mentioned as well as the potential spouse being detected as a person by the NER
A different approach is used for the get\_awards() function. The approach is based on the assumption that the NER tagger will tag the awards as some kind of entity. A set of keywords is
the used to extract all entities of interest, the awards.
@@ -124,7 +128,7 @@
and \textit{Rasa NLU}. The \textit{Rasa NLU} component takes care of getting user input and
matching it with the respective intents. The \textit{Rasa-Core} component executes all actions
associated with the determined intent. Configuration has been organized in reference to
examples from the Rasa github repository\footnote{\url{}}. \\
examples from the Rasa github repository\footnote{\url{}}. \par
Rasa NLU has been trained with example questions in markdown format that contain highlighted
entities. This ensures that the bot is able to understand intents and extract the entities
inside the sentences. One example can be seen in \ref{nlu_example}.