Skip to content
Snippets Groups Projects

Add xml extraction script

Merged David Fuhry requested to merge xml-extraction into master
2 files
+ 4
2
Compare changes
  • Side-by-side
  • Inline
Files
2
+ 18
0
library(xml2)
data <- read_xml("../data/Wikipedia-20181120103842.xml")
title.nodes <- xml_find_all(data, ".//title")
titles <- sapply(title.nodes, xml_text)
text.nodes <- xml_find_all(data, ".//text")
texts <- sapply(text.nodes, xml_text)
df.out <- data.frame(Title = titles,
Text = texts)
saveRDS(df.out, "../data/texte.RDS")
write.table(df.out, "../data/texte.csv")
\ No newline at end of file
Loading