improve evaluation:
- recombine combining characters to previous letter char,
incorporate that into all metrics and remove metric
combining-e-umlauts
- introduce parameter
gt_level
forhistoric_latin
: - add multi-character normalizations to
historic_latin
(historic ligatures and MUFI) whengt_level < 3
- use single-character equivalences beyond NFKC when
gt_level==1
- encapsulate counting edit distances into class, use parallel aggregation algorithm for accurate mean and variance estimates
- expose metric, gtlevel and confusion params to standalone CLI
eval
- expose confusion size param to OCR-D CLI
evaluate