Skip to content
Snippets Groups Projects
README.md 2.34 KiB
Newer Older
Lucas Schons's avatar
Lucas Schons committed
# Wiki Rasa

## Contributing
Before merging please make sure to check the following:
* If your script uses any libraries check if they are in `packages.list` and if not add them
* Does your contribution require any additional configuration? If so please update `README.md` and `INSTALL.md`
    * Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in `INSTALL.md` and also in `install.sh`
### Writing custom feature extraction functions
When writing a function to extract a feature use the following as guidelines:
* Place your file in the `r` folder with an appropriate name
* Add a function call to `Master.R` within the main apply function
    * The parameters you hand to your function here will determine what you may work with
        * `article[1]` is the name of the physicits
        * `article[2]` and `article[3]` contain the page and revision id respectivly
        * `article[4]` contains the raw html text of the article
        * `cleaned.text` for the cleaned text
        * `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
        * You may use additional parameters to your liking
    * Your function will allways be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the resutls data frame at the very end of the main apply function
## Installation
### General prerequisites
The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file. 
For a detailed guide on installing on a Debian 9 machine take a look at [Installation](INSTALL.md). 
## Running
The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.