# Wiki Rasa ## Contributing Before merging please make sure to check the following: * If your script uses any libraries check if they are in `packages.list` and if not add them * Does your contribution require any additional configuration? If so please update `README.md` and `docs/install_debian.md` * If your changes need any system level changes, make sure to also add these in `Dockerfile` and `install.sh` * Please make sure the wikiproc package can be build by calling `devtools::document()` as well as `R CMD build wikiproc` and possibly also `devtools::check()` ### Writing custom feature extraction functions When writing a function to extract a feature use the following as guidelines: * Place your file in the `processing/wikiproc/R` folder with an appropriate name * Add a function call to `master.R` within the main apply function * The parameters you hand to your function here will determine what you may work with * `article[1]` is the name of the physicits * `article[2]` and `article[3]` contain the page and revision id respectivly * `article[4]` contains the raw html text of the article * `cleaned.text` for the cleaned text * `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help. * You may use additional parameters to your liking * Your function will allways be given data for a single article you do not need to make your function vectorized * Bind the output of your function to the resutls data frame at the very end of the main apply function * Please don't use library imports, if possible call the functions explicitly via `::`. If you need to load a library do so in `import_packages.R`. ### Steps to build * Make sure your functions are properly commented for roxygen * If your function is to be visible from the outside, make sure to add `@export` to the roxygen comment * Set the working directory to `wikiproc` and call `devtools::document()` * Step into `processing` and use `devtools::install("wikiproc")` to install the package ## Installation You may use this software by installing the **wikiproc** package and then running the `master.R` script. There are also directions on how to install from scratch on a debian vm and on how to build a docker. ### General prerequisites The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file. To build the **wikiproc** package navigate to the processing directory and run: ```bash R CMD build wikiproc R CMD INSTALL wikiproc_<version>.tar.gz ``` _Note: This will require the [R Tools](https://cran.r-project.org/bin/windows/Rtools/) on windows and possibly additional packages on *nix platforms._ The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing. ### Installing on debian For a detailed guide on installing on a Debian 9 machine take a look at [Installation](docs/install_debian.md). ### Building the docker **_Work in progress_** Run the build script for your system, e.g. on Windows `build_docker.bat` or `build_docker.sh` on Linux. After that you should be good to start the docker with ```sh docker run -it chatbot ```