Coptic NLP Service

Enter Coptic text in UTF-8 (XML markup is also allowed, 10,000 characters max).
Bound groups should be separated by spaces or underscores.

If you need to analyze longer texts or multiple texts automatically, you can log in to the secure area or use the API. For a login please contact Amir Zeldes.

Dialect:

Auto-detect
Sahidic
Bohairic

Input:

My data contains meaningful linebreaks This inserts <line>..</line> tags around each line of text.
If you already have <lb/> tags or your data is already tokenized, you probably want to ignore line breaks.

Ignore linebreaks in my data

Output:

Use old finite state tokenizer Less accurate, provided for reproducing older results. Not compatible with detokenization.

Re-merge bound groups Regularizes bound group spaces if input does not follow Layton's guidelines
(a.k.a. 'Laytonization'; increases accuracy on Till-segmented text and OCR)

SGML pipeline
Just piped and dashed morphemes

Result: