improve evaluation: (!2) · Merge requests · ocr-d / cor-asv-ann

Robert Sachunsky requested to merge improved-evaluation into master Jan 24, 2020

recombine combining characters to previous letter char, incorporate that into all metrics and remove metric combining-e-umlauts
introduce parameter gt_level for historic_latin:
add multi-character normalizations to historic_latin (historic ligatures and MUFI) when gt_level < 3
use single-character equivalences beyond NFKC when gt_level==1
encapsulate counting edit distances into class, use parallel aggregation algorithm for accurate mean and variance estimates
expose metric, gtlevel and confusion params to standalone CLI eval
expose confusion size param to OCR-D CLI evaluate

improve evaluation: