# Wiki Rasa ## Contributing Before merging please make sure to check the following: * If your script uses any libraries check if they are in `packages.list` and if not add them * Does your contribution require any additional configuration? If so please update `README.md` and `INSTALL.md` * Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in `INSTALL.md` and also in `install.sh` ### Writing custom feature extraction functions When writing a function to extract a feature use the following as guidelines: * Place your file in the `r` folder with an appropriate name * Add a function call to `Master.R` within the main apply function * The parameters you hand to your function here will determine what you may work with * `article[1]` is the name of the physicits * `article[2]` and `article[3]` contain the page and revision id respectivly * `article[4]` contains the raw html text of the article * `cleaned.text` for the cleaned text * `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help. * You may use additional parameters to your liking * Your function will allways be given data for a single article you do not need to make your function vectorized * Bind the output of your function to the resutls data frame at the very end of the main apply function ## Installation ### General prerequisites The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file. For a detailed guide on installing on a Debian 9 machine take a look at [Installation](INSTALL.md). ## Running The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.