Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FoLia-2text: extracting original text from a correction #60

Open
kosloot opened this issue May 10, 2021 · 3 comments
Open

FoLia-2text: extracting original text from a correction #60

kosloot opened this issue May 10, 2021 · 3 comments

Comments

@kosloot
Copy link
Contributor

kosloot commented May 10, 2021

Given this folia (also attached correctie.xml.txt)
FoLiA-2text can extract the corrected text: één.. But there is no way to extract the original text.

The naive FoLiA-2text --class original doesn't work.
So how to do this?
@proycon same question for folia2txtfrom FoliaPY

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="bug" generator="libfolia-v2.4" version="2.5">
  <metadata type="native">
    <annotations>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <token-annotation/>
      <sentence-annotation/>
      <paragraph-annotation/>
      <correction-annotation set="folia-correct">
        <annotator processor="FoLiA-correct.1"/>
      </correction-annotation>
    </annotations>
    <provenance>
      <processor xml:id="FoLiA-correct.1" begindatetime="2020-01-06T12:08:30" command="FoLiA-correct --punct=punct.punct --unk=unk.unk --rank=rank.ranked --clear --inputclass=Test --ngram=3 -v  -v " folia_version="2.2.1" host="bonus" name="FoLiA-correct" user="sloot" version="0.14">
        <processor xml:id="FoLiA-correct.1.generator" folia_version="2.2.1" name="libfolia" type="generator" version="2.4"/>
      </processor>
    </provenance>
  </metadata>
  <text xml:id="text">
    <p xml:id="p1">
      <s xml:id="s1">
        <correction xml:id="cor.1" set="folia-correct">
          <new>
            <w xml:id="w3.cor">
              <t>één.</t>
            </w>
          </new>
          <original auth="no">
            <w xml:id="w3">
              <t>een.</t>
            </w>
          </original>
        </correction>
      </s>
    </p>
  </text>
</FoLiA>
@proycon
Copy link
Member

proycon commented May 10, 2021

Yes, class original doesn't work (that used to be something in an early FoLiA version). The example is valid and correct, both have the current class. There is a solution for retrieving this, but I'm not sure to what extent it's also implemented in libfolia. For text() and textcontent() I use the correctionhandling parameter to choose which path to follow, the value can be either CURRENT, ORIGINAL or EITHER. A value of "CURRENT" follows the normal authoritative path (i.e <new> or <current>, note that despite the name this does not relate to text classes at all anymore) and "ORIGINAL" follows the original path.

folia2txt doesn't provide an interface to this currently, foliacorrect does.

@kosloot
Copy link
Contributor Author

kosloot commented May 11, 2021

I would very much like to see this implemented in folia2txt. @proycon Should I add an issue ?

@proycon
Copy link
Member

proycon commented May 11, 2021

Yes, that sounds like a good idea yes. (may take a bit before I get to it though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants