Skip to content
Snippets Groups Projects
user avatar
David Fuhry authored
fde1d815
History

Wiki Rasa

Contributing

Before merging please make sure to check the following:

  • If your script uses any libraries check if they are in packages.list and if not add them
  • Does your contribution require any additional configuration? If so please update README.md and INSTALL.md
    • Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in INSTALL.md and also in install.sh

Writing custom feature extraction functions

When writing a function to extract a feature use the following as guidelines:

  • Place your file in the processing/wikiproc/R folder with an appropriate name
  • Add a function call to master.R within the main apply function
    • The parameters you hand to your function here will determine what you may work with
      • article[1] is the name of the physicits
      • article[2] and article[3] contain the page and revision id respectivly
      • article[4] contains the raw html text of the article
      • cleaned.text for the cleaned text
      • annotations contains the cleanNLP annotation object, to access it use the clnp_get functions. See here for help.
      • You may use additional parameters to your liking
    • Your function will allways be given data for a single article you do not need to make your function vectorized
  • Bind the output of your function to the resutls data frame at the very end of the main apply function
  • Please don't use library imports, if possible call the functions explicitly via ::. If you need to load a library do so in import_packages.R.

Steps to build:

  • Make sure your functions are properly commented for roxygen
    • If your function is to be visible from the outside, make sure to add @export to the roxygen comment
  • Set the working directory to wikiproc and call devtools::document()
  • Step into processing and use devtools::install("wikiproc") to install the package

Installation

General prerequisites

The script assumes all the packages in the packages.list file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named spcy, if you need to change that do so in the ProcessNER.R file.

For a detailed guide on installing on a Debian 9 machine take a look at Installation.

Running

The data processing side is done by the Master.R script in the r folder. This may be called via Rscript r/Master.R from any command line or via source("r/Master.R") from within R. The script assumes the working direcory to be the base directory wiki-rasa so make sure to either call Rscript from within this directory or to set the working directory in R here prior to sourcing.