Coptic NLP Service

Enter Coptic text in UTF-8 (XML markup is also allowed).
Bound groups should be separated by spaces or underscores.

Dialect:

Auto-detect
Sahidic
Bohairic

Input:

My data contains meaningful linebreaks This inserts <line>..</line> tags around each line of text.
If you already have <lb/> tags or your data is already tokenized, you probably want to ignore line breaks.

Ignore linebreaks in my data

Output:

Use old finite state tokenizer Less accurate, provided for reproducing older results. Not compatible with detokenization.

Re-merge bound groups Regularizes bound group spaces if input does not follow Layton's guidelines
(a.k.a. 'Laytonization'; increases accuracy on Till-segmented text and OCR)

SGML pipeline
Just piped and dashed morphemes

Result: