1. 02 Apr, 2020 1 commit
  2. 14 Feb, 2020 6 commits
  3. 10 Feb, 2020 12 commits
  4. 05 Feb, 2020 2 commits
  5. 29 Jan, 2020 3 commits
  6. 23 Jan, 2020 1 commit
  7. 21 Jan, 2020 1 commit
    • Robert Sachunsky's avatar
      improve evaluation: · 89cb09ba
      Robert Sachunsky authored
      - recombine combining characters to previous letter char,
        incorporate that into all metrics and remove metric
      - introduce parameter `gt_level` for `historic_latin`:
      - add multi-character normalizations to `historic_latin`
        (historic ligatures and MUFI) when `gt_level < 3`
      - use single-character equivalences beyond NFKC when `gt_level==1`
      - encapsulate counting edit distances into class,
        use parallel aggregation algorithm for accurate mean and variance
      - expose metric, gtlevel and confusion params to standalone CLI `eval`
      - expose confusion size param to OCR-D CLI `evaluate`
  8. 09 Jan, 2020 2 commits
  9. 18 Nov, 2019 1 commit
  10. 15 Nov, 2019 6 commits
    • Robert Sachunsky's avatar
    • Robert Sachunsky's avatar
      v0.1.2 · b4040b7e
      Robert Sachunsky authored
    • Robert Sachunsky's avatar
      lib.seq2seq: improve/fix beam decoder · 28fc4b79
      Robert Sachunsky authored
      - fix A* prospective cost calculation:
        * sequence length was always 1
        * estimate cost per remaining character
          from mean cost per existing character
          (instead of assuming a fixed probability
           of 0.61)
      - fix break condition for beam search:
        it does not suffice to fill the fan-out
        according to beam_output_width, also
        there must not be any active hypotheses
        better than the worst inactive hypotheses
      - increase upper limit on number of iterations
      - improve alignment:
        instead of allowing arbitrary input-output
        alignments, constrain solutions to adhere
        to strict linear monotonicity by incurring
        extra costs to hypothesis steps growing
        (linearly) with the distance of the (average)
        alignment position from 1 plus the predecessor
      - improve rejection:
        instead of keeping one single hypothesis with
        the verbatim input globally, identify the
        input character locally (via alignments), and
        add always that as candidate into the beam
        unconditionally and with a minimum probability
        (configurable as rejection threshold), thus
        allowing branching from/into the input
        candidates into/from other hypotheses
    • Robert Sachunsky's avatar
      lib.seq2seq: make Tensorflow grow GPU memory on demand · 7dcc39ea
      Robert Sachunsky authored
      (instead of allocating all)
    • Robert Sachunsky's avatar
      lib.evaluate: show most-frequent confusion · 4dedaae4
      Robert Sachunsky authored
      - alignment: store a confusion table of all
        observed pairs along with their counts;
        add kwarg to constructor to activate;
        add function to show most common pairs
        sorted up to a given limit
      - evaluate: use separate aligners with
        confusion counting for greedy, beamed and OCR;
        show 10 most common confusions
      - evaluate: catch exception when beam does not
        yield results and create sensible dummy result
    • Robert Sachunsky's avatar
      scripts.repl: improve visualisation · d7af1846
      Robert Sachunsky authored
      - show both greedy and beamed results
      - show input and output strings on the axes
        (in a font with better-than-default Unicode support)
      - show input and output character in the axes formatter
        (to quickly navigate with mouse hover)
      - show the probabilities of the best predicted output
      - show colorbar scales for alignment and probability
      - mark rejection step with extra color
  11. 25 Oct, 2019 1 commit
  12. 22 Oct, 2019 1 commit
  13. 19 Jul, 2019 3 commits