Commit ba4cf576 authored by Robert Sachunsky's avatar Robert Sachunsky
Browse files

rename conreform→smarthec_backend

parent 0b6a876b
TAGNAME = bertsky/conreform
TAGNAME = bertsky/smarthec_backend
build:
docker build -t $(TAGNAME) .
......
# conreform
AI backend for SmartHEC project: OCR extraction of relevant information from scanned forms via context recognition
# smarthec_backend
> AI backend for SmartHEC project: OCR extraction of relevant information from scanned forms via context recognition
Defines a Docker service that runs an [OCR-D](https://ocr-d.de) [workflow](https://ocr-d.de/en/spec/glossary#ocr-d-workflow) for text extraction of predefined form fields (visual object classes) from scanned/photographed forms on given [OCR-D workspaces](https://ocr-d.de/en/spec/glossary#workspace). The workspace is assumed to contain nothing but a fileGrp `OCR-D-IMG` with the raw images, and will be annotated up to a final fileGrp `OCR-D-OCR-TESS-deu-SEG-tesseract-sparse-FORM-OCR` with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) representing the final result.
......@@ -77,3 +78,19 @@ To query resulting PAGE-XML (`pc="http://schema.primaresearch.org/PAGE/gts/pagec
```xpath
//pc:TextLine[contains(@custom,"subtype:target=gebaeude_heizkosten_raumwaerme")]/pc:Coords/@points
```
(For targets, `TextLine/Coords` and `TextRegion/Coords` are identical.)
- respective confidence:
```xpath
//pc:TextLine[contains(@custom,"subtype:target=gebaeude_heizkosten_raumwaerme")]/pc:Coords/@conf
```
- path name of last derived image on page level (with image preprocessing):
```xpath
/pc:PcGts/pc:Page/pc:AlternativeImage[last()]/@filename
```
- respective coordinate transform for that (a 3x3 matrix after `coords=` prefix):
```xpath
/pc:PcGts/pc:Page/@custom
```
(Apply to any segment polygon like [this](https://github.com/OCR-D/core/blob/1df3f456e1284444725a420ba5392c08a86d95aa/ocrd_utils/ocrd_utils/image.py#L131-L134), with the actual transformation [here](https://github.com/OCR-D/core/blob/1df3f456e1284444725a420ba5392c08a86d95aa/ocrd_utils/ocrd_utils/image.py#L304-L319).)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment