Skip to content
Snippets Groups Projects
Commit f6cda3c4 authored by David Fuhry's avatar David Fuhry :fist: Committed by Lucas Schons
Browse files

Resolve "Update documentation"

parent caee2a2e
No related branches found
No related tags found
No related merge requests found
# Install instructions
This provides instructions for seting up the software on a freshly installed debian 9 system. It will most likely work on any recent ubuntu system too, though there may be some hickup with the python versions.
## Installing Debian
This assumes a standard install of debian was made using the [smallcd AMD64](https://www.debian.org/distrib/netinst#smallcd) debian image. It was tested selecting only the base system with the standard system utilities (which contain python) and no gui.
This guide assumes during setup a user named rasa was created, though this shouldn't be too hard to adapt to.
### Hypervisor specific steps
#### Hyper-V
Nothing to do, works out of the box.
#### KVM
Not tested.
#### VirtualBox
Works.
## Installing sudo
Though not required we'll make rasa a sudoer for convenience reasons.
First log in as root and run
```shell
apt-get install sudo
```
Next we'll make the `rasa` user a sudoer
```shell
usermod -aG sudo rasa
```
All done here. `exit` and log in as rasa.
## Seting up python for cleanNLP
Just to make sure we update the system with. We'll also need gcc nad git, so go ahead and install em.
```shell
sudo apt-get update && sudo apt-get dist-upgrade -y && sudo apt-get install gcc git build-essential python-dev -y
```
Next, install miniconda:
```shell
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x84_64.sh
```
Defaults are fine here.
Log out and back in.
Now we create an environment for spacy and install it:
```shell
conda create -n spcy python=3
conda activate spcy
pip install spacy
python -m spacy download en
conda deactivate
```
## Installing R
_There is a script that will do all these things for you. If you want to use it skip ahead to **Cloning the project** and be sure to execute the script as described there_
We need to add the cran repository to sources.list as the r packages in the debian repositories are somewhat out of date.
For that we'll need a few packages
```shell
sudo apt install dirmngr --install-recommends
sudo apt install software-properties-common apt-transport-https -y
```
Now we'll add the key for the cran ppa and add the ppa
```shell
sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'
```
Finally we may install R
```shell
sudo apt-get update
sudo apt-get install r-base-dev
```
While we're at it, we install a few more things we need for some R packages and also git.
```shell
sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev git -y
```
## Cloning the project
Run:
```shell
git clone https://git.informatik.uni-leipzig.de/text-mining-chatbot/wiki-rasa.git
cd wiki-rasa
```
_If skipping the steps above run the install script now._
```shell
./install.sh
```
## Installing R Packages
This needs to be done from an Interactive R console as R will ask wheather to use an personal library the first time installing packages. To do this, open R and type the following:
```r
install.packages(readLines("packages.list"))
```
This will install all the packages required. When asked if you want to use a personal library say yes and accept the defaults.
\ No newline at end of file
# Wiki Rasa
### Installation
2 Optionen:
## Contributing
1. Option: Python 3.6.6 installiert haben oder downgraden von 3.7 (wird von Tensorflow noch nicht unterstützt)
Dann rasa core mit ```pip install rasa_core``` und rasa nlu mit ```pip install rasa_nlu``` installieren.
2. Option: Anaconda installieren, eine Python 3.6.6 Umgebung erstellen und dann rasa installieren.
Before merging please make sure to check the following:
* If your script uses any libraries check if they are in `packages.list` and if not add them
* Does your contribution require any additional configuration? If so please update `README.md` and `INSTALL.md`
* Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in `INSTALL.md` and also in `install.sh`
### Example Project zum laufen bringen
[stories.md](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/data/stories.md), [domain.yml](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/domain.yml), [nlu.md](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/data/nlu.md) downloaden.
```nlu_config.yml``` mit folgendem Inhalt erstellen:
```{md}
language: en
pipeline: tensorflow_embedding
```
### Writing custom feature extraction functions
Dann kann das Modell trainiert werden mit:
```
# rasa core
python -m rasa_core.train -d domain.yml -s stories.md -o models/dialogue
When writing a function to extract a feature use the following as guidelines:
* Place your file in the `r` folder with an appropriate name
* Add a function call to `Master.R` within the main apply function
* The parameters you hand to your function here will determine what you may work with
* `article[1]` is the name of the physicits
* `article[2]` and `article[3]` contain the page and revision id respectivly
* `article[4]` contains the raw html text of the article
* `cleaned.text` for the cleaned text
* `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
* You may use additional parameters to your liking
* Your function will allways be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the resutls data frame at the very end of the main apply function
# Natural Language processing
python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose
```
Danach kann man mit dem Bot reden mit:
```
python -m rasa_core.run -d models/dialogue -u models/current/nlu
```
## Installation
### General prerequisites
# R Scripts
The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file.
### PhysicistsList.R
Will crawl wikipedias [List of Physicists](https://en.wikipedia.org/wiki/List_of_physicists) for all physicist names and use that list to download the corresponding articles from the wikipedia api.
Will generate a csv containing the gathered articles in the data directory as well as a RDS object containing the data as binary.
For a detailed guide on installing on a Debian 9 machine take a look at [Installation](INSTALL.md).
## Running
The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.
#!/usr/bin/env bash
sudo apt-get update && sudo apt-get dist-upgrade -y
sudo apt install dirmngr --install-recommends
sudo apt install software-properties-common apt-transport-https -y
sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'
sudo apt-get update
sudo apt-get install r-base-dev -y
sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev -y
\ No newline at end of file
pbapply
rvest
stringi
textclean
stringr
data.table
xml2
WikipediR
reticulate
cleanNLP
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment