Skip to content
Snippets Groups Projects
README.md 1.89 KiB
Newer Older
Lucas Schons's avatar
Lucas Schons committed
# Wiki Rasa

### Installation

2 Optionen:

1. Option: Python 3.6.6 installiert haben oder downgraden von 3.7 (wird von Tensorflow noch nicht unterstützt)  
Dann rasa core mit ```pip install rasa_core``` und rasa nlu mit ```pip install rasa_nlu``` installieren.
2. Option: Anaconda installieren, eine Python 3.6.6 Umgebung erstellen und dann rasa installieren.

### Example Project zum laufen bringen

[stories.md](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/data/stories.md), [domain.yml](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/domain.yml), [nlu.md](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/data/nlu.md) downloaden.  
```nlu_config.yml```  mit folgendem Inhalt erstellen:
```{md}
language: en
pipeline: tensorflow_embedding
```

Dann kann das Modell trainiert werden mit:
```
# rasa core
python -m rasa_core.train -d domain.yml -s stories.md -o models/dialogue

# Natural Language processing 
python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose
```
Danach kann man mit dem Bot reden mit:
```
python -m rasa_core.run -d models/dialogue -u models/current/nlu
```


### R Scripts

# PhysicistsList.R

Will crawl wikipedias [List of Physicists](https://en.wikipedia.org/wiki/List_of_physicists) for all physicist names and save them in a file *Physicists.txt* in the data directory.
Use that file to generate xml dump at wikipedias [Export page](https://en.wikipedia.org/wiki/Special:Export)

# ExtractFromXML.Rasa

Will read in the xml file from the data directory and extract the title and text of the pages in the dump. Will then write them to *texte.csv* in the data directory. For convenience will also create a texte.RDS file, load with `texte <- read.RDS("../data/texte.RDS")`.
**NOTE:** For the script to work, the first line of the xml needs to be replaced with `<mediawiki xml:lang="en">`.