Will crawl wikipedias [List of Physicists](https://en.wikipedia.org/wiki/List_of_physicists) for all physicist names and save them in a file *Physicists.txt* in the data directory.
Will crawl wikipedias [List of Physicists](https://en.wikipedia.org/wiki/List_of_physicists) for all physicist names and use that list to download the corresponding articles from the wikipedia api.
Use that file to generate xml dump at wikipedias [Export page](https://en.wikipedia.org/wiki/Special:Export)
Will generate a csv containing the gathered articles in the data directory as well as a RDS object containing the data as binary.
### ExtractFromXML.Rasa
Will read in the xml file from the data directory and extract the title and text of the pages in the dump. Will then write them to *texte.csv* in the data directory, use `read.table` to import. For convenience will also create a texte.RDS file, load with `texte <- readRDS("../data/texte.RDS")`.
**NOTE:** For the script to work, the first line of the xml needs to be replaced with `<mediawiki xml:lang="en">`.