Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textclass properties on entities not honoured when interpreting wref/@t #21

Open
proycon opened this issue Apr 30, 2018 · 4 comments
Open
Assignees

Comments

@proycon
Copy link
Member

proycon commented Apr 30, 2018

folialint breaks on the following document with error (foliavalidator does not complain):

XML error: WordRefence id=TEI.1.text.1.body.1.div1.1.head.1.s.1.w.3 has another value for  the t attribute them it's reference. (Zuidhollanschen versus Zuydthollanschen)

It should look in the right textclass, which is explicitly specified at the entity level.

'Minimal' FoLiA example (http://lst.science.ru.nl/~proycon/issue52.folia.xml):

    <s xml:id="TEI.1.par">                                                                                                                                                                                                                                     
            <w xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.3" class="WORD" set="tokconfig-nld">                                                                                                                                                            
              <t>Zuydthollanschen</t>                                                                                                                                                                                                                          
              <t class="contemporary">Zuidhollanschen</t>                                                                                                                                                                                                      
              <pos class="SPEC(deeleigen)" confidence="1" head="SPEC" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" textclass="contemporary">                                                                                                              
                <feat class="deeleigen" subset="spectype"/>                                                                                                                                                                                                    
              </pos>                                                                                                                                                                                                                                           
              <lemma class="Zuidhollanschen" set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl" textclass="contemporary"/>                                                                                                                                       
            </w>                                                                                                                                                                                                                                               
            <w xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.4" class="WORD" set="tokconfig-nld" space="no">                                                                                                                                                 
              <t>Synodi</t>                                                                                                                                                                                                                                    
              <t class="contemporary">Sijnodi</t>                                                                                                                                                                                                              
              <pos class="SPEC(deeleigen)" confidence="1" head="SPEC" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" textclass="contemporary">                                                                                                              
                <feat class="deeleigen" subset="spectype"/>                                                                                                                                                                                                    
              </pos>                                                                                                                                                                                                                                           
              <lemma class="Sijnodi" set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl" textclass="contemporary"/>                                                                                                                                               
            </w>                                                                                                                                                                                                                                               
            <entities xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.entities.1">                                                                                                                                                                               
              <entity xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.entities.1.entity.1" class="pro" confidence="0.68202" set="http://ilk.uvt.nl/folia/sets/frog-ner-nl" textclass="contemporary">                                                             
                <wref id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.3" t="Zuidhollanschen"/>                                                                                                                                                                     
                <wref id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.4" t="Sijnodi"/>                                                                                                                                                                             
              </entity>                                                                                                                                                                                                                                        
            </entities>                                                                                                                                                                                                                                        
    </s>     
@proycon proycon added the bug label Apr 30, 2018
@proycon proycon changed the title textclass properties on entities on honoured when interpreting wref/@t textclass properties on entities not honoured when interpreting wref/@t Apr 30, 2018
@proycon
Copy link
Member Author

proycon commented Apr 30, 2018

(Resolution needed for completion of INL/nederlab-linguistic-enrichment#12)

@kosloot
Copy link
Contributor

kosloot commented Apr 30, 2018

Ok,
the error is detected when parsing the wref node, and before appending it to the layer.
So the textclass of the layer is yet unknown. (
It uses the textclass of the referenced Word, which is wrong indeed)
Probably the check has to be postponed to the post_append() method?

@kosloot
Copy link
Contributor

kosloot commented May 1, 2018

A good solution is not easy. For the moment, this check is disabled.

@kosloot kosloot added enhancement and removed bug labels Nov 18, 2019
@kosloot
Copy link
Contributor

kosloot commented Nov 18, 2019

The check is disabled. But should ideally be performed at some stage.
So it keep the issue as an enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants