Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate CodeMirror input as valid SGML/NLP compatible #126

Open
amir-zeldes opened this issue Feb 15, 2019 · 0 comments
Open

Validate CodeMirror input as valid SGML/NLP compatible #126

amir-zeldes opened this issue Feb 15, 2019 · 0 comments

Comments

@amir-zeldes
Copy link
Contributor

Invalid SGML can break the spreadsheet conversion, for example:

<lb>ⲧⲱⲛ_·>_ϫⲉ|ⲟⲩ|ⲣⲱ</lb>

Additionally, some NLP components expect specific constraints on data. For Coptic Scriptorium, element content (not inside attributes) may not contain - or |, since the NLP chain is expecting no pre-tokenization. The latter problem could potentially be solved in the NLP tools, but users would be unaware of what is probably an error.

Possible solutions:

  1. Validate SGML to rule out unescaped >, < in element content (probably in JS in client, maybe highlighting in CodeMirror?)
  2. Make an easily configurable script that also checks for specific characters in element content, configurable per instance, or via the validations table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant