Wiki Rasa
Contributing
Before merging please make sure to check the following:
- If your script uses any libraries check if they are in
packages.list
and if not add them - Does your contribution require any additional configuration? If so please update
README.md
anddocs/install_debian.md
- If your changes need any system level changes, make sure to also add these in
Dockerfile
andinstall.sh
- If your changes need any system level changes, make sure to also add these in
- Please make sure the wikiproc package can be build by calling
devtools::document()
as well asR CMD build wikiproc
and possibly alsodevtools::check()
Writing custom feature extraction functions
When writing a function to extract a feature use the following as guidelines:
- Place your file in the
processing/wikiproc/R
folder with an appropriate name - Add a function call to
master.R
within the main apply function- The parameters you hand to your function here will determine what you may work with
-
article[1]
is the name of the physicits -
article[2]
andarticle[3]
contain the page and revision id respectivly -
article[4]
contains the raw html text of the article -
cleaned.text
for the cleaned text -
annotations
contains the cleanNLP annotation object, to access it use the clnp_get functions. See here for help. - You may use additional parameters to your liking
-
- Your function will allways be given data for a single article you do not need to make your function vectorized
- The parameters you hand to your function here will determine what you may work with
- Bind the output of your function to the resutls data frame at the very end of the main apply function
- Please don't use library imports, if possible call the functions explicitly via
::
. If you need to load a library do so inimport_packages.R
.
Steps to build
- Make sure your functions are properly commented for roxygen
- If your function is to be visible from the outside, make sure to add
@export
to the roxygen comment
- If your function is to be visible from the outside, make sure to add
- Set the working directory to
wikiproc
and calldevtools::document()
- Step into
processing
and usedevtools::install("wikiproc")
to install the package
Installation
You may use this software by installing the wikiproc package and then running the master.R
script. There are also directions on how to install from scratch on a debian vm and on how to build a docker.
General prerequisites
The script assumes all the packages in the packages.list
file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named spcy
, if you need to change that do so in the ProcessNER.R
file.
To build the wikiproc package navigate to the processing directory and run:
R CMD build wikiproc
R CMD INSTALL wikiproc_<version>.tar.gz
Note: This will require the R Tools on windows and possibly additional packages on *nix platforms.
The data processing side is done by the Master.R
script in the r
folder. This may be called via Rscript r/Master.R
from any command line or via source("r/Master.R")
from within R. The script assumes the working direcory to be the base directory wiki-rasa
so make sure to either call Rscript
from within this directory or to set the working directory in R here prior to sourcing.
Installing on debian
For a detailed guide on installing on a Debian 9 machine take a look at Installation.
Building the docker
Work in progress
Run the build script for your system, e.g. on Windows build_docker.bat
or build_docker.sh
on Linux.
After that you should be good to start the docker with
docker run -it chatbot