Skip to content
Snippets Groups Projects
user avatar
David Fuhry authored
cbdc73c3
History
Name Last commit Last update
data
docs
r
.gitignore
README.md

Wiki Rasa

Installation

2 Optionen:

  1. Option: Python 3.6.6 installiert haben oder downgraden von 3.7 (wird von Tensorflow noch nicht unterstützt)
    Dann rasa core mit pip install rasa_core und rasa nlu mit pip install rasa_nlu installieren.
  2. Option: Anaconda installieren, eine Python 3.6.6 Umgebung erstellen und dann rasa installieren.

Example Project zum laufen bringen

stories.md, domain.yml, nlu.md downloaden.
nlu_config.yml mit folgendem Inhalt erstellen:

language: en
pipeline: tensorflow_embedding

Dann kann das Modell trainiert werden mit:

# rasa core
python -m rasa_core.train -d domain.yml -s stories.md -o models/dialogue

# Natural Language processing 
python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose

Danach kann man mit dem Bot reden mit:

python -m rasa_core.run -d models/dialogue -u models/current/nlu

R Scripts

PhysicistsList.R

Will crawl wikipedias List of Physicists for all physicist names and save them in a file Physicists.txt in the data directory. Use that file to generate xml dump at wikipedias Export page

ExtractFromXML.Rasa

Will read in the xml file from the data directory and extract the title and text of the pages in the dump. Will then write them to texte.csv in the data directory, use read.table to import. For convenience will also create a texte.RDS file, load with texte <- readRDS("../data/texte.RDS"). NOTE: For the script to work, the first line of the xml needs to be replaced with <mediawiki xml:lang="en">.