Resolve "Enhancing R: Optimize GetBirthdate.R"
Done
Adds unit test
- adds unit test
Adds use of annotations to birthdate extraction script
-
Use Infobox to extract date, if there is no infobox get first date entity from text assumption: first date entity is birthdate
-
There are still results like this from infobox search:
[[972]]
[1] "Frits Zernike16 July 1888"
- If I try to remove the name with regex, I remove the whole string in some cases and get a lot of empty strings as results.
What do you think? Leave errors like this inside?
ToDo
-
Use Annotations on cleaned text -
Deal with stupid bug that leaves names included -
Remove infobox as data source because it is structured data
Closes #39 (closed)
Edited by Lukas Gehrke