Skip to content
Snippets Groups Projects
README.md 3.71 KiB
Newer Older
Lucas Schons's avatar
Lucas Schons committed
# Wiki Rasa

## Contributing
Before merging please make sure to check the following:
* If your script uses any libraries check if they are in `packages.list` and if not add them
* Does your contribution require any additional configuration? If so please update `README.md` and `docs/install_debian.md`
  * If your changes need any system level changes, make sure to also add these in `Dockerfile` and `install.sh`
* Please make sure the wikiproc package can be build by calling `devtools::document()` as well as `R CMD build wikiproc` and possibly also `devtools::check()`
### Writing custom feature extraction functions
When writing a function to extract a feature use the following as guidelines:
* Place your file in the `processing/wikiproc/R` folder with an appropriate name
* Add a function call to `master.R` within the main apply function
  * The parameters you hand to your function here will determine what you may work with
    * `article[1]` is the name of the physicits
    * `article[2]` and `article[3]` contain the page and revision id respectivly
    * `article[4]` contains the raw html text of the article
    * `cleaned.text` for the cleaned text
    * `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
    * You may use additional parameters to your liking
  * Your function will allways be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the resutls data frame at the very end of the main apply function
* Please don't use library imports, if possible call the functions explicitly via `::`. If you need to load a library do so in `import_packages.R`.

### Steps to build

* Make sure your functions are properly commented for roxygen
  * If your function is to be visible from the outside, make sure to add `@export` to the roxygen comment
* Set the working directory to `wikiproc` and call `devtools::document()`
* Step into `processing` and use `devtools::install("wikiproc")` to install the package
## Installation
You may use this software by installing the **wikiproc** package and then running the `master.R` script. There are also directions on how to install from scratch on a debian vm and on how to build a docker.

### General prerequisites
The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file.

To build the **wikiproc** package navigate to the processing directory and run:

```bash
R CMD build wikiproc
R CMD INSTALL wikiproc_<version>.tar.gz
```

_Note: This will require the [R Tools](https://cran.r-project.org/bin/windows/Rtools/) on windows and possibly additional packages on *nix platforms._

The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.

### Installing on debian

For a detailed guide on installing on a Debian 9 machine take a look at [Installation](docs/install_debian.md).

### Building the docker

**_Work in progress_**
Run the build script for your system, e.g. on Windows `build_docker.bat` or `build_docker.sh` on Linux.
After that you should be good to start the docker with
```sh
docker run -it chatbot
```