diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000000000000000000000000000000000000..1c196d79b70889b096647caf2abf01f07a87768d
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,127 @@
+# Install instructions
+
+This provides instructions for seting up the software on a freshly installed debian 9 system. It will most likely work on any recent ubuntu system too, though there may be some hickup with the python versions.
+
+## Installing Debian
+
+This assumes a standard install of debian was made using the [smallcd AMD64](https://www.debian.org/distrib/netinst#smallcd) debian image. It was tested selecting only the base system with the standard system utilities (which contain python) and no gui.
+This guide assumes during setup a user named rasa was created, though this shouldn't be too hard to adapt to.
+
+### Hypervisor specific steps
+
+#### Hyper-V
+
+Nothing to do, works out of the box.
+
+#### KVM
+
+Not tested.
+
+#### VirtualBox
+
+Works.
+
+## Installing sudo
+
+Though not required we'll make rasa a sudoer for convenience reasons.
+
+First log in as root and run
+
+```shell
+apt-get install sudo
+```
+
+Next we'll make the `rasa` user a sudoer
+
+```shell
+usermod -aG sudo rasa
+```
+
+All done here. `exit` and log in as rasa.
+
+## Seting up python for cleanNLP
+
+Just to make sure we update the system with. We'll also need gcc nad git, so go ahead and install em.
+
+```shell
+sudo apt-get update && sudo apt-get dist-upgrade -y && sudo apt-get install gcc git build-essential python-dev -y
+```
+
+Next, install miniconda:
+
+```shell
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+bash Miniconda3-latest-Linux-x84_64.sh
+```
+
+Defaults are fine here.
+
+Log out and back in.
+
+Now we create an environment for spacy and install it:
+
+```shell
+conda create -n spcy python=3
+conda activate spcy
+pip install spacy
+python -m spacy download en
+conda deactivate
+```
+
+## Installing R
+
+_There is a script that will do all these things for you. If you want to use it skip ahead to **Cloning the project** and be sure to execute the script as described there_ 
+
+We need to add the cran repository to sources.list as the r packages in the debian repositories are somewhat out of date.
+
+For that we'll need a few packages
+
+```shell
+sudo apt install dirmngr --install-recommends
+sudo apt install software-properties-common apt-transport-https -y
+```
+
+Now we'll add the key for the cran ppa and add the ppa
+
+```shell
+sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
+sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'
+```
+
+Finally we may install R
+
+```shell
+sudo apt-get update
+sudo apt-get install r-base-dev
+```
+
+While we're at it, we install a few more things we need for some R packages and also git.
+
+```shell
+sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev git -y
+```
+
+## Cloning the project
+
+Run:
+
+```shell
+git clone https://git.informatik.uni-leipzig.de/text-mining-chatbot/wiki-rasa.git
+cd wiki-rasa
+```
+
+_If skipping the steps above run the install script now._ 
+
+```shell
+./install.sh
+```
+
+## Installing R Packages
+
+This needs to be done from an Interactive R console as R will ask wheather to use an personal library the first time installing packages. To do this, open R and type the following:
+
+```r
+install.packages(readLines("packages.list"))
+```
+
+This will install all the packages required. When asked if you want to use a personal library say yes and accept the defaults.
\ No newline at end of file
diff --git a/README.md b/README.md
index 4d448cbc7a6ed2727f6816f170bf414575622391..d0b0e4a8cbb46cb8f2d147e3f5b6d1170ea2be5e 100644
--- a/README.md
+++ b/README.md
@@ -1,41 +1,37 @@
 # Wiki Rasa
 
-### Installation
 
-2 Optionen:
+## Contributing
 
-1. Option: Python 3.6.6 installiert haben oder downgraden von 3.7 (wird von Tensorflow noch nicht unterstützt)  
-Dann rasa core mit ```pip install rasa_core``` und rasa nlu mit ```pip install rasa_nlu``` installieren.
-2. Option: Anaconda installieren, eine Python 3.6.6 Umgebung erstellen und dann rasa installieren.
+Before merging please make sure to check the following:
+* If your script uses any libraries check if they are in `packages.list` and if not add them
+* Does your contribution require any additional configuration? If so please update `README.md` and `INSTALL.md`
+    * Some R packages require system level libraries on OS X and Linux, if that is the make sure they are added in `INSTALL.md` and also in `install.sh`
 
-### Example Project zum laufen bringen
 
-[stories.md](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/data/stories.md), [domain.yml](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/domain.yml), [nlu.md](https://github.com/RasaHQ/rasa_core/blob/master/examples/moodbot/data/nlu.md) downloaden.  
-```nlu_config.yml```  mit folgendem Inhalt erstellen:
-```{md}
-language: en
-pipeline: tensorflow_embedding
-```
+### Writing custom feature extraction functions
 
-Dann kann das Modell trainiert werden mit:
-```
-# rasa core
-python -m rasa_core.train -d domain.yml -s stories.md -o models/dialogue
+When writing a function to extract a feature use the following as guidelines:
+* Place your file in the `r` folder with an appropriate name
+* Add a function call to `Master.R` within the main apply function
+    * The parameters you hand to your function here will determine what you may work with
+        * `article[1]` is the name of the physicits
+        * `article[2]` and `article[3]` contain the page and revision id respectivly
+        * `article[4]` contains the raw html text of the article
+        * `cleaned.text` for the cleaned text
+        * `annotations` contains the cleanNLP annotation object, to access it use the clnp_get functions. See [here](https://cran.r-project.org/web/packages/cleanNLP/cleanNLP.pdf) for help.
+        * You may use additional parameters to your liking
+    * Your function will allways be given data for a single article you do not need to make your function vectorized
+* Bind the output of your function to the resutls data frame at the very end of the main apply function
 
-# Natural Language processing 
-python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose
-```
-Danach kann man mit dem Bot reden mit:
-```
-python -m rasa_core.run -d models/dialogue -u models/current/nlu
-```
+## Installation
 
+### General prerequisites
 
-# R Scripts
+The script assumes all the packages in the `packages.list` file are installed within R. Furthermore you will need to have an spacy installation with the english language data installed. By default the script will assume to find this in a conda environment named `spcy`, if you need to change that do so in the `ProcessNER.R` file. 
 
-### PhysicistsList.R
-
-Will crawl wikipedias [List of Physicists](https://en.wikipedia.org/wiki/List_of_physicists) for all physicist names and use that list to download the corresponding articles from the wikipedia api.
-Will generate a csv containing the gathered articles in the data directory as well as a RDS object containing the data as binary.
+For a detailed guide on installing on a Debian 9 machine take a look at [Installation](INSTALL.md). 
 
+## Running
 
+The data processing side is done by the `Master.R` script in the `r` folder. This may be called via `Rscript r/Master.R` from any command line or via `source("r/Master.R")` from within R. The script assumes the working direcory to be the base directory `wiki-rasa` so make sure to either call `Rscript` from within this directory or to set the working directory in R here prior to sourcing. 
diff --git a/install.sh b/install.sh
new file mode 100755
index 0000000000000000000000000000000000000000..c429b7e77a6b5ff8314f9678d79830e739597555
--- /dev/null
+++ b/install.sh
@@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+sudo apt-get update && sudo apt-get dist-upgrade -y
+sudo apt install dirmngr --install-recommends
+sudo apt install software-properties-common apt-transport-https -y
+sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
+sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'
+sudo apt-get update
+sudo apt-get install r-base-dev -y
+sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev -y
\ No newline at end of file
diff --git a/packages.list b/packages.list
new file mode 100644
index 0000000000000000000000000000000000000000..d61e2fb55d01e0b19aaa9b98f33b84328257a7d5
--- /dev/null
+++ b/packages.list
@@ -0,0 +1,10 @@
+pbapply
+rvest
+stringi
+textclean
+stringr
+data.table
+xml2
+WikipediR
+reticulate
+cleanNLP
\ No newline at end of file