Minor fixes

783a1ab8 · David Fuhry · 52f20fcf · 783a1ab8 · 783a1ab8
Commit 783a1ab8 authored 6 years ago by David Fuhry
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@ Use that file to generate xml dump at wikipedias [Export page](https://en.wikipe

 # ExtractFromXML.Rasa

-Will read in the xml file from the data directory and extract the title and text of the pages in the dump. Will then write them to *texte.csv* in the data directory. For convenience will also create a texte.RDS file, load with `texte <- read.RDS("../data/texte.RDS")`.
+Will read in the xml file from the data directory and extract the title and text of the pages in the dump. Will then write them to *texte.csv* in the data directory, use `read.table` to import.  For convenience will also create a texte.RDS file, load with `texte <- readRDS("../data/texte.RDS")`.
 **NOTE:** For the script to work, the first line of the xml needs to be replaced with `<mediawiki xml:lang="en">`.


--- a/r/ExtractFromXML.R
+++ b/r/ExtractFromXML.R
@@ -13,4 +13,6 @@ texts <- sapply(text.nodes, xml_text)
 df.out <- data.frame(Title = titles,
                     Text = texts)

-write.csv2(df.out, "../data/texte.csv")
+saveRDS(df.out, "../data/texte.RDS")
+
+write.table(df.out, "../data/texte.csv")
\ No newline at end of file