Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make configurable list of XML attributes coming back from NLP to collapse to one column #133

Open
amir-zeldes opened this issue Mar 13, 2019 · 0 comments

Comments

@amir-zeldes
Copy link
Contributor

In Coptic Scriptorium, we have attributes that can apply to different XML elements. For example, we have xml:lang on either <norm.. or <morph like this: (made up example)

<norm_group orig_group="ⲡⲁⲅⲅⲉⲗⲟⲥ" norm_group="ⲡⲁⲅⲅⲉⲗⲟⲥ">
<norm xml:id="u1" pos="ART" lemma="ⲡ" func="det" head="#u2" orig="ⲡ" norm="ⲡ">
ⲡ
</norm>
<norm xml:id="u2" pos="N" lemma="ⲁⲅⲅⲉⲗⲟⲥ" xml:lang="Greek" func="root" orig="ⲁⲅⲅⲉⲗⲟⲥ" norm="ⲁⲅⲅⲉⲗⲟⲥ">
ⲁⲅⲅⲉⲗⲟⲥ
</norm>
</norm_group>
<norm_group orig_group="ⲛⲧⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ" norm_group="ⲛⲧⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ">
<norm xml:id="u3" pos="CREL" lemma="ⲉⲧⲉⲣⲉ" func="mark" head="#u4" orig="ⲛⲧ" norm="ⲛⲧ">
ⲛⲧ
</norm>
<norm xml:id="u4" pos="N" lemma="ⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ" func="acl" head="#u2" orig="ⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ" norm="ⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ">
<morph morph="ⲙⲛⲧ">
ⲙⲛⲧ
</morph>
<morph xml:lang="Greek" morph="ⲁⲅⲅⲉⲗⲟⲥ">
ⲁⲅⲅⲉⲗⲟⲥ
</morph>
</norm>
</norm_group>

The desired EtherCalc behavior with this NLP output is to collapse both xml:lang annotations into one 'lang' column, but this behavior is hard wired in ether.py. Conceivably, another project would want the normal output columns: morph_xml_id and norm_xml_id to be distinguished.

The names of annotations being collapsed this way should be configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant