Skip to content
Snippets Groups Projects
sachunsky's avatar
Robert Sachunsky authored
- when extending lexicon transducer according to composition_depth,
  do not ignore upper/lower case completely, but ensure that
  non-first words are downcased (with infix/zero connection) or
  only upper case (with hyphen connection), and that first words
  are upcased or already upper case

- when extending lexicon transducer with morphology,
  compose *after* compounds were added

- when using lexicon transducer, make sure to allow both precomposed umlauts
  and decomposed (with diacritical combining e);
  also, ensure the final lexicon becomes but an acceptor

- when repeating lexicon transducer according to words_per_window,
  use 1 to N instead of 0 to N (optionalized lexicon), but make sure
  the last (1) token has no space

- repair inter-word/lm lexicon model previously defunct:
  - by stripping initial space from loaded punctuation_right_transducer
  - by correctly synchronizing on flags
95e92e6c
History
Name Last commit Last update
cython
hfst
open-fst