Newer
Older
Before merging please make sure to check the following:
* If your script uses any libraries check if they are in `packages.list` and if not add them
* Does your contribution require any additional configuration? If so please update `README.md` and `INSTALL.md`
* Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in `INSTALL.md` and also in `install.sh`
### Writing custom feature extraction functions
When writing a function to extract a feature use the following as guidelines:
* Place your file in the `processing/wikiproc/R` folder with an appropriate name
* Add a function call to `master.R` within the main apply function
* The parameters you hand to your function here will determine what you may work with
* `article[1]` is the name of the physicits
* `article[2]` and `article[3]` contain the page and revision id respectivly
* `article[4]` contains the raw html text of the article
* `cleaned.text` for the cleaned text
* `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
* You may use additional parameters to your liking
* Your function will allways be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the resutls data frame at the very end of the main apply function
* Please don't use library imports, if possible call the functions explicitly via `::`. If you need to load a library do so in `import_packages.R`.
Steps to build:
* Make sure your functions are properly commented for roxygen
* If your function is to be visible from the outside, make sure to add `@export` to the roxygen comment
* Set the working directory to `wikiproc` and call `devtools::document()`
* Step into `processing` and use `devtools::install("wikiproc")` to install the package
The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file.

David Fuhry
committed
For a detailed guide on installing on a Debian 9 machine take a look at [Installation](INSTALL.md).

David Fuhry
committed

David Fuhry
committed
The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.