Skip to content
Snippets Groups Projects
Commit 842de283 authored by David Fuhry's avatar David Fuhry :fist: Committed by Lucas Schons
Browse files

Resolve "Documentation: Improvements"

parent 29f3bf38
No related branches found
No related tags found
No related merge requests found
......@@ -277,5 +277,8 @@ models/
.Rproj.user
.vscode
# Archives
*.tar.gz
# Misc
*.txt
FROM rocker/r-ver:3.5.1
# Get package dependencys
RUN apt-get update && apt-get install -y --no-install-recommends libxml2-dev \
libssl-dev \
libcurl4-openssl-dev \
wget \
bzip2 \
curl \
git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install miniconda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-4.5.11-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda clean -tipsy && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
# Copy conda env setup script
# This is required as conda activate commands will not work during docker build,
# unless called from within a bash script
COPY docker/create_spcy.sh /setup/create_spcy.sh
# Create miniconda environment
RUN bash /setup/create_spcy.sh
# Copy package list
COPY processing/packages.list /setup/packages.list
# Install R Packages
RUN R -e "install.packages(readLines('/setup/packages.list'))"
# Copy the rasa data over
COPY rasa/* /app/rasa/
COPY docker/create_rasa.sh /setup/create_rasa.sh
COPY docker/train_rasa.sh /setup/train_rasa.sh
# Install rasa and requirements
RUN bash /setup/create_rasa.sh
# Copy wikiproc package, needs to be created by the build script
COPY wikiproc_0.0.0.9000.tar.gz /setup/wikiproc_0.0.0.9000.tar.gz
# Install wikiproc package
RUN R CMD INSTALL /setup/wikiproc_0.0.0.9000.tar.gz
# Copy R script and bash wrapper. Also readme, as its currently needed to find root directory
COPY processing/script/master.R /app/script/master.R
COPY docker/master.sh /app/script/master.sh
COPY README.md /app/README.md
# Optionally: Copy cache to speed up data processing
COPY data/articles.RDS /app/data/articles.RDS
COPY data/annotations/* /app/data/annotations/
RUN bash /app/script/master.sh
# Train the rasa bot
RUN bash /setup/train_rasa.sh
# Clean up stuff we won't need in production
RUN rm -rf /setup/* && \
rmdir /setup/ && \
rm -rf /app/data/* && \
rmdir /app/data/
COPY docker/docker-entrypoint.sh /app/docker-entrypoint.sh
ENTRYPOINT ["/app/docker-entrypoint.sh"]
# CMD [ "/bin/bash" ]
\ No newline at end of file
......@@ -12,6 +12,7 @@ This guide assumes during setup a user named rasa was created, though this shoul
#### Hyper-V
Nothing to do, works out of the box.
Tested using Hyper-V Quick Create accepting the defaults.
#### KVM
......@@ -39,9 +40,51 @@ usermod -aG sudo rasa
All done here. `exit` and log in as rasa.
## Seting up python for cleanNLP
## Script based installation
Just to make sure we update the system with. We'll also need gcc nad git, so go ahead and install em.
_This provides instructions for installing with the help of an bash script, if you want to install manually skip ahead._
### Installing git
First, we'll install git and a few essentials we'll need along the way.
```bash
sudo apt-get update && \
sudo apt-get dist-upgrade -y && \
sudo apt-get install -y --no-install-recommends gcc git build-essential python-dev -y
```
To clone the project via git run:
```bash
git clone https://git.informatik.uni-leipzig.de/text-mining-chatbot/wiki-rasa.git
cd wiki-rasa
```
Now, run the installer script, it will take care of installing miniconda, spacy and also R.
```bash
./install.sh
```
Finally we'll need to install R packages. We have to do this in an interactive R shell as R will ask wheather to use a personal library. From an R shell run the following:
```r
install.packages(readLines("processing/packages.list"))
```
To install the wikiproc package navigate to the processing directory and run:
```bash
R CMD build wikiproc
R CMD INSTALL wikiproc_<version>.tar.gz
```
That's it. You should be good to go and run the master script now.
## Manual installation
Just to make sure we update the system and also install some stuff we'll need.
```shell
sudo apt-get update && sudo apt-get dist-upgrade -y && sudo apt-get install gcc git build-essential python-dev -y
......@@ -70,8 +113,6 @@ conda deactivate
## Installing R
_There is a script that will do all these things for you. If you want to use it skip ahead to **Cloning the project** and be sure to execute the script as described there_
We need to add the cran repository to sources.list as the r packages in the debian repositories are somewhat out of date.
For that we'll need a few packages
......@@ -110,12 +151,6 @@ git clone https://git.informatik.uni-leipzig.de/text-mining-chatbot/wiki-rasa.gi
cd wiki-rasa
```
_If skipping the steps above run the install script now._
```shell
./install.sh
```
## Installing R Packages
This needs to be done from an Interactive R console as R will ask wheather to use an personal library the first time installing packages. To do this, open R and type the following:
......@@ -126,6 +161,15 @@ install.packages(readLines("packages.list"))
This will install all the packages required. When asked if you want to use a personal library say yes and accept the defaults.
To install the wikiproc package navigate to the processing directory and run:
```bash
R CMD build wikiproc
R CMD INSTALL wikiproc_<version>.tar.gz
```
That's it. You should be good to go and run the master script now.
=======
## Bot Setup
In order to setup and run the [Rasa Bot](https://rasa.com/docs/) we recommend to use a [conda](https://conda.io/docs/user-guide/getting-started.html#managing-environmentsß) environment again with Python 3.6.7
......
# Wiki Rasa
## Contributing
Before merging please make sure to check the following:
* If your script uses any libraries check if they are in `packages.list` and if not add them
* Does your contribution require any additional configuration? If so please update `README.md` and `INSTALL.md`
* Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in `INSTALL.md` and also in `install.sh`
* If your script uses any libraries check if they are in `packages.list` and if not add them
* Does your contribution require any additional configuration? If so please update `README.md` and `docs/install_debian.md`
* If your changes need any system level changes, make sure to also add these in `Dockerfile` and `install.sh`
* Please make sure the wikiproc package can be build by calling `devtools::document()` as well as `R CMD build wikiproc` and possibly also `devtools::check()`
### Writing custom feature extraction functions
When writing a function to extract a feature use the following as guidelines:
* Place your file in the `processing/wikiproc/R` folder with an appropriate name
* Add a function call to `master.R` within the main apply function
* The parameters you hand to your function here will determine what you may work with
* `article[1]` is the name of the physicits
* `article[2]` and `article[3]` contain the page and revision id respectivly
* `article[4]` contains the raw html text of the article
* `cleaned.text` for the cleaned text
* `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
* You may use additional parameters to your liking
* Your function will allways be given data for a single article you do not need to make your function vectorized
* The parameters you hand to your function here will determine what you may work with
* `article[1]` is the name of the physicits
* `article[2]` and `article[3]` contain the page and revision id respectivly
* `article[4]` contains the raw html text of the article
* `cleaned.text` for the cleaned text
* `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
* You may use additional parameters to your liking
* Your function will allways be given data for a single article you do not need to make your function vectorized
* Bind the output of your function to the resutls data frame at the very end of the main apply function
* Please don't use library imports, if possible call the functions explicitly via `::`. If you need to load a library do so in `import_packages.R`.
Steps to build:
### Steps to build
* Make sure your functions are properly commented for roxygen
* If your function is to be visible from the outside, make sure to add `@export` to the roxygen comment
* If your function is to be visible from the outside, make sure to add `@export` to the roxygen comment
* Set the working directory to `wikiproc` and call `devtools::document()`
* Step into `processing` and use `devtools::install("wikiproc")` to install the package
## Installation
You may use this software by installing the **wikiproc** package and then running the `master.R` script. There are also directions on how to install from scratch on a debian vm and on how to build a docker.
### General prerequisites
The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file.
The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file.
To build the **wikiproc** package navigate to the processing directory and run:
```bash
R CMD build wikiproc
R CMD INSTALL wikiproc_<version>.tar.gz
```
_Note: This will require the [R Tools](https://cran.r-project.org/bin/windows/Rtools/) on windows and possibly additional packages on *nix platforms._
The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.
### Installing on debian
For a detailed guide on installing on a Debian 9 machine take a look at [Installation](docs/install_debian.md).
### Building the docker
**_Work in progress_**
For a detailed guide on installing on a Debian 9 machine take a look at [Installation](INSTALL.md).
Run the build script for your system, e.g. on Windows `build_docker.bat` or `build_docker.sh` on Linux.
## Running
After that you should be good to start the docker with
The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing.
```sh
docker run -it chatbot
```
R CMD build processing/wikiproc
docker build -t chatbot .
\ No newline at end of file
#!/usr/bin/env bash
R CMD build processing/wikiproc
docker build -t chatbot .
\ No newline at end of file
#!/usr/bin/env bash
source ~/.bashrc
conda create -y -n rasa_env python=3.6.7
conda activate rasa_env
pip install spacy
pip install rasa_nlu
pip install rasa_core
pip install sklearn_crfsuite
python -m spacy download en_core_web_md
python -m spacy link en_core_web_md en
\ No newline at end of file
#!/usr/bin/env bash
#!/usr/bin/env bash
source ~/.bashrc
conda create -y -n spcy python=3
conda activate spcy
pip install spacy
python -m spacy download en
\ No newline at end of file
#!/usr/bin/env bash
source ~/.bashrc
cd /app/rasa
conda activate rasa_env
make run
\ No newline at end of file
#!/usr/bin/env bash
source ~/.bashrc && cd /app/ && Rscript /app/script/master.R
\ No newline at end of file
#!/usr/bin/env bash
source ~/.bashrc
conda activate rasa_env
cd /app/rasa
make train
\ No newline at end of file
#!/usr/bin/env bash
sudo apt-get update && sudo apt-get dist-upgrade -y
sudo apt install dirmngr --install-recommends
sudo apt install software-properties-common apt-transport-https -y
sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'
sudo apt-get update
sudo apt-get install r-base-dev -y
sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev -y
\ No newline at end of file
# Install conda
wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda clean -tipsy && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
# Create conda env
source ~/.bashrc && \
conda create -y -n spcy python=3 && \
conda activate spcy && \
pip install spacy && \
python -m spacy download en && \
conda deactivate
sudo apt-get update && sudo apt-get dist-upgrade -y && \
sudo apt install dirmngr -y --install-recommends && \
sudo apt install software-properties-common apt-transport-https -y && \
sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && \
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/' && \
sudo apt-get update && \
sudo apt-get install -y r-base-dev && \
sudo apt-get install --no-install-recommends libcurl4-openssl-dev libssl-dev libxml2-dev -y
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment