Files · 95e92e6cabe18aeb525e36464d63e4916bd31ba8 · ocr-d / cor-asv-fst

- when extending lexicon transducer according to composition_depth,
  do not ignore upper/lower case completely, but ensure that
  non-first words are downcased (with infix/zero connection) or
  only upper case (with hyphen connection), and that first words
  are upcased or already upper case

- when extending lexicon transducer with morphology,
  compose *after* compounds were added

- when using lexicon transducer, make sure to allow both precomposed umlauts
  and decomposed (with diacritical combining e);
  also, ensure the final lexicon becomes but an acceptor

- when repeating lexicon transducer according to words_per_window,
  use 1 to N instead of 0 to N (optionalized lexicon), but make sure
  the last (1) token has no space

- repair inter-word/lm lexicon model previously defunct:
  - by stripping initial space from loaded punctuation_right_transducer
  - by correctly synchronizing on flags

95e92e6c

History

95e92e6c 6 years ago

History

Name	Last commit	Last update
cython
hfst
open-fst