diff --git a/docs/final-report/report.pdf b/docs/final-report/report.pdf index d303b34f732b45aba1558e0acfffff208c7c2061..bbb9431a12d5b5da20bf2c05a19098dfe93491ab 100644 Binary files a/docs/final-report/report.pdf and b/docs/final-report/report.pdf differ diff --git a/docs/final-report/report.tex b/docs/final-report/report.tex index 229eac5ae9e6b23496667d3d9fd640a1c7acc6f0..ef6f8cc404c5d6299186e7bd319edb9955b503d4 100644 --- a/docs/final-report/report.tex +++ b/docs/final-report/report.tex @@ -13,12 +13,16 @@ \begin{document} +\pagenumbering{roman} + \maketitle \tableofcontents \pagebreak +\pagenumbering{arabic} + \section{Project Description} \subsection{Conversational AI and Training} @@ -26,15 +30,15 @@ conversation. One important goal is to make the conversation seem as natural as possible. Ideally, an interaction with the bot should be indistinguishable from one with a human. This can make communication with a computer become very pleasant and easy for humans as - they are simply using their natural language. - \\ Conversational AI can be used in Voice Assistants that communicate through spoken words or + they are simply using their natural language. \par + Conversational AI can be used in Voice Assistants that communicate through spoken words or through chatbots that imitate a human by sending text messages. \subsection{Rasa Framework} Rasa is a collection of tools for conversational AI software. The \textit{Rasa Stack} consists of two open source libraries called \textit{Rasa NLU} and \textit{Rasa Core} that can be used to create contextual - chatbots. - \\ A Rasa Bot needs training data to work properly. + chatbots. \par + A Rasa Bot needs training data to work properly. \subsection{Research Question} The objective of this project is to find out, whether chatbots can be trained with natural @@ -95,7 +99,7 @@ it is fragmented into somewhat arbitrary subcategories and thus not optimal to use as a collection. However Wikipedia also has a \textit{List of physicists}\footnote{\url{https://en.wikipedia.org/wiki/List_of_physicists}} which contains 981 physicists and was - used to build the collection used. \\ + used to build the collection used. \par Data scraping was done using the R Package \textit{WikipediR}, a wrapper around the Wikipedia API. Articles were downloaded as HTML\footnote{HTML was chosen over wikitext to ease text cleaning} @@ -106,7 +110,7 @@ As all approaches leverage on some form of NER or POS tagging, annotations were created for all texts. This was done using the R Package \textit{cleanNLP} with a spaCy backend to create NER and POS - tags, as well as lemmatization. \\ + tags, as well as lemmatization. \par Fact extraction for physicists spouses was done using pre-defined patterns on word lemmata.\footnote{Functionality to use patterns on POS tags is also available but did not yield a better outcome.} @@ -114,7 +118,7 @@ places to look for the name of the physicist and his/her spouse. When a matching phrase is found the results are verified by checking that the correct physicist is mentioned as well as the potential spouse being detected as a person by the NER - tagger. + tagger. \par A different approach is used for the get\_awards() function. The approach is based on the assumption that the NER tagger will tag the awards as some kind of entity. A set of keywords is the used to extract all entities of interest, the awards. @@ -124,7 +128,7 @@ and \textit{Rasa NLU}. The \textit{Rasa NLU} component takes care of getting user input and matching it with the respective intents. The \textit{Rasa-Core} component executes all actions associated with the determined intent. Configuration has been organized in reference to - examples from the Rasa github repository\footnote{\url{https://github.com/RasaHQ/rasa_core/tree/master/examples}}. \\ + examples from the Rasa github repository\footnote{\url{https://github.com/RasaHQ/rasa_core/tree/master/examples}}. \par Rasa NLU has been trained with example questions in markdown format that contain highlighted entities. This ensures that the bot is able to understand intents and extract the entities inside the sentences. One example can be seen in \ref{nlu_example}.