Skip to content
Snippets Groups Projects

Wiki Rasa

Contributing

Before merging please make sure to check the following:

  • If your script uses any libraries check if they are in packages.list and if not add them
  • Does your contribution require any additional configuration? If so please update README.md and docs/install_debian.md
    • If your changes need any system level changes, make sure to also add these in Dockerfile and install.sh
  • Please make sure the wikiproc package can be build by calling devtools::document() as well as R CMD build wikiproc and possibly also devtools::check()

Writing custom feature extraction functions

When writing a function to extract a feature use the following as guidelines:

  • Place your file in the processing/wikiproc/R folder with an appropriate name
  • Add a function call to master.R within the main apply function
    • The parameters you hand to your function here will determine what you may work with
      • article[1] is the name of the physicits
      • article[2] and article[3] contain the page and revision id respectivly
      • article[4] contains the raw html text of the article
      • cleaned.text for the cleaned text
      • annotations contains the cleanNLP annotation object, to access it use the clnp_get functions. See here for help.
      • You may use additional parameters to your liking
    • Your function will allways be given data for a single article you do not need to make your function vectorized
  • Bind the output of your function to the resutls data frame at the very end of the main apply function
  • Please don't use library imports, if possible call the functions explicitly via ::. If you need to load a library do so in import_packages.R.

Steps to build

  • Make sure your functions are properly commented for roxygen
    • If your function is to be visible from the outside, make sure to add @export to the roxygen comment
  • Set the working directory to wikiproc and call devtools::document()
  • Step into processing and use devtools::install("wikiproc") to install the package

Installation

You may use this software by installing the wikiproc package and then running the master.R script. There are also directions on how to install from scratch on a debian vm and on how to build a docker.

General prerequisites

The script assumes all the packages in the packages.list file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named spcy, if you need to change that do so in the ProcessNER.R file.

To build the wikiproc package navigate to the processing directory and run:

R CMD build wikiproc
R CMD INSTALL wikiproc_<version>.tar.gz

Note: This will require the R Tools on windows and possibly additional packages on *nix platforms.

The data processing side is done by the Master.R script in the r folder. This may be called via Rscript r/Master.R from any command line or via source("r/Master.R") from within R. The script assumes the working direcory to be the base directory wiki-rasa so make sure to either call Rscript from within this directory or to set the working directory in R here prior to sourcing.

Installing on debian

For a detailed guide on installing on a Debian 9 machine take a look at Installation.

Building the docker

Work in progress

Run the build script for your system, e.g. on Windows build_docker.bat or build_docker.sh on Linux.

After that you should be good to start the docker with

docker run -it chatbot