Skip to content
Snippets Groups Projects
Commit ae093df7 authored by Lukas Gehrke's avatar Lukas Gehrke
Browse files

Merge branch 'final-touch-README' into 'master'

Fix file names and spelling stuff in the README

See merge request !80
parents 54c5b28f a92a60c2
No related branches found
No related tags found
1 merge request!80Fix file names and spelling stuff in the README
......@@ -12,7 +12,7 @@ The R script assumes all the packages in the `packages.list` file are installed
install.packages(readLines('/processing/packages.list'))
```
Furthermore you will need to have a spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `Master.R` file.
Furthermore you will need to have a spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `master.R` file.
To build the **wikiproc** package navigate to the processing directory and run:
......@@ -30,7 +30,7 @@ conda create -n rasa_env python=3.6.7
source activate rasa_env
```
In this environment install rasa_nlu, rasa_core, sklean_crfsuite and spacy. Also download the spacy en_core_web_md language data.
In this environment install rasa_nlu, rasa_core, sklearn_crfsuite and spacy. Also download the spacy en_core_web_md language data.
```{bash}
pip install rasa_nlu
......@@ -43,10 +43,10 @@ python -m spacy link en_core_web_md en
### Running
The data processing side is done by the `Master.R` script in the `processing/script` folder. The script assumes the working direcory to be somewhere within the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing. Easiest way is to call the script from the base directory of the repository:
The data processing side is done by the `master.R` script in the `processing/script` folder. The script assumes the working directory to be somewhere within the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing. Easiest way is to call the script from the base directory of the repository:
```{bash}
Rscript processing/script/Master.R
Rscript processing/script/master.R
```
This will download the required data, process it and generate the data file required for the chat bot. After that train the bot (don't forget to activate the conda environment if you're using one).
......@@ -97,14 +97,14 @@ When writing a function to extract a feature use the following as guidelines:
* Place your file in the `processing/wikiproc/R` folder with an appropriate name
* Add a function call to `master.R` within the main apply function
* The parameters you hand to your function here will determine what you may work with
* `article[1]` is the name of the physicits
* `article[2]` and `article[3]` contain the page and revision id respectivly
* `article[1]` is the name of the physicists
* `article[2]` and `article[3]` contain the page and revision id respectively
* `article[4]` contains the raw html text of the article
* `cleaned.text` for the cleaned text
* `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
* `annotations` contains the cleanNLP annotation object, to access it use the cnlp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
* You may use additional parameters to your liking
* Your function will allways be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the resutls data frame at the very end of the main apply function
* Your function will always be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the results data frame at the very end of the main apply function
* Please don't use library imports, if possible call the functions explicitly via `::`. If you need to load a library do so in `import_packages.R`.
### Steps to build
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment