Robert Sachunsky
authored
- when extending lexicon transducer according to composition_depth, do not ignore upper/lower case completely, but ensure that non-first words are downcased (with infix/zero connection) or only upper case (with hyphen connection), and that first words are upcased or already upper case - when extending lexicon transducer with morphology, compose *after* compounds were added - when using lexicon transducer, make sure to allow both precomposed umlauts and decomposed (with diacritical combining e); also, ensure the final lexicon becomes but an acceptor - when repeating lexicon transducer according to words_per_window, use 1 to N instead of 0 to N (optionalized lexicon), but make sure the last (1) token has no space - repair inter-word/lm lexicon model previously defunct: - by stripping initial space from loaded punctuation_right_transducer - by correctly synchronizing on flags