Skip to content

Resolve "Enhancing R: Optimize GetBirthdate.R"

Lukas Gehrke requested to merge 39-enhancing-r-optimize-getbirthdate-r into master

Done

Adds unit test

  • adds unit test

Adds use of annotations to birthdate extraction script

  • Use Infobox to extract date, if there is no infobox get first date entity from text assumption: first date entity is birthdate

  • There are still results like this from infobox search:

[[972]]
[1] "Frits Zernike16 July 1888"
  • If I try to remove the name with regex, I remove the whole string in some cases and get a lot of empty strings as results.

What do you think? Leave errors like this inside?

ToDo

  • Use Annotations on cleaned text

  • Deal with stupid bug that leaves names included

  • Remove infobox as data source because it is structured data

Closes #39 (closed)

Edited by Lukas Gehrke

Merge request reports