Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
feat(standoff)!: return XML alongside HTML for textValue with custom …
…standoff mapping and default XSL transformation (DEV-201) (#1991)

* testing: add stubs. for StandoffModels

* test: add test data for standoff custom mapping

* expand standoff ontology

* feat: return XML even with custom mapping

* test: start working on a proper E2E test for standoff with custom mapping

* test: add TODO for TEI related task

* test: add more TODOs

* test: add some more stubs for StandoffModels

* Update standoffModelsUtil.scala

* remove unused StandoffModelsUtil file

* test: revert unnecessary base ontology changes

* test: fix sample xml file

* test: add E2E test for standoff with custom mapping

* test:  adjust FileModels

* test: clean up Standoff E2E test

* test: add tests for StandoffModels

* refactor: minor cleaning up

* test: add mock sipi to standoff e2e test

* tests: start with standoff E2E tests

* Update StandoffModels.scala

* tests: check if sipi is available in E2E test

* tests: enable SIPI in E2E tests

* test: clean up

* test: move SIPI utils to E2ESpec

* test: rename E2E test from R2R to E2E

* test: reasonably test creating a standoff mapping in a unit test

* refactor: tidy up

* test: add e2e test for standard mapping

* refactor: tidy up unit tests

* refactor: remove potentially unused files

* testdata: add gitignore

* test: add standoff example to freetest test data

* refactor: clean up after merging Bazel-to-SBT PR

* docs: start documenting standoff

* docs: update documentation

* docs: update documentation

* docs: update documentation

* refactor: final tidy up

* refactor: changes according to review

* docs: add scaladoc

* refactor: minor cleanup according to review

* refactor: rename variable to be more clear what it actually is

* docs: update documentation to make creating text values with custom standoff more clear

* docs: fix typo

* refactor: move sipi messages from test into the sipi messages package
  • Loading branch information
BalduinLandolt committed Mar 7, 2022
1 parent eac0049 commit 2548b8f
Show file tree
Hide file tree
Showing 38 changed files with 1,494 additions and 2,160 deletions.
37 changes: 16 additions & 21 deletions docs/01-introduction/standoff-rdf.md
Expand Up @@ -5,15 +5,14 @@

# Standoff/RDF Text Markup

[Standoff markup](https://lexiconse.uantwerpen.be/index.php/lexicon/markup-standoff/)
is text markup that is stored separately from the content it describes. Knora's
[Standoff markup](https://lexiconse.uantwerpen.be/lexicon/markupStandoff.html)
is text markup that is stored separately from the content it describes. DSP-API's
Standoff/RDF markup stores content as a simple Unicode string, and represents markup
separately as RDF data. This approach has some advantages over commonly used markup systems
such as XML:

First, XML and other hierarchical markup systems assume that a document is a hierarchy, and
have difficulty representing
[non-hierarchical structures](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html)
have difficulty representing [non-hierarchical structures](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html)
or multiple overlapping hierarchies. Standoff markup can easily represent these structures.

Second, markup languages are typically designed to be used in text files. But there is no
Expand All @@ -22,43 +21,39 @@ markup. It is possible to do this in a non-standard way by using an XML database
such as [eXist](http://exist-db.org), but this still does not allow for queries that include
text as well as non-textual data not stored in XML.

By storing markup as RDF, Knora can search for markup structures in the same way that it
By storing markup as RDF, DSP-API can search for markup structures in the same way as it
searches for any RDF data structure. This makes it possible to do searches that combine
text-related criteria with other sorts of criteria. For example, if persons and events are
represented as Knora resources, and texts are represented in Standoff/RDF, a text can contain
represented as resources, and texts are represented in Standoff/RDF, a text can contain
tags representing links to persons or events. You could then search for a text that mentions a
person who lived in the same city as another person who is the author of a text that mentions an
event that occurred during a certain time period.

In Knora's Standoff/RDF, a tag is an RDF entity that is linked to a
In DSP-API's Standoff/RDF, a tag is an RDF entity that is linked to a
[text value](../02-knora-ontologies/knora-base.md#textvalue). Each tag points to a substring
of the text, and has semantic properties of its own. You can define your own tag classes
in your ontology by making subclasses of `knora-base:StandoffTag`, and attach your own
properties to them. You can then search for those properties using Knora's search language,
properties to them. You can then search for those properties using DSP-API's search language,
[Gravsearch](../03-apis/api-v2/query-language.md).

The built-in [knora-base](../02-knora-ontologies/knora-base.md) and `standoff` ontologies
provide some basic tags that can be reused or extended. These include tags that represent
Knora data types. For example, `knora-base:StandoffDateTag` represents a date in exactly the
same way as a Knora [date value](../02-knora-ontologies/knora-base.md#datevalue), i.e. as a
DSP-API data types. For example, `knora-base:StandoffDateTag` represents a date in exactly the
same way as a [date value](../02-knora-ontologies/knora-base.md#datevalue), i.e. as a
calendar-independent astronomical date. You can use this tag as-is, or extend it by making
a subclass, to represent dates in texts. Gravsearch includes built-in functionality for
searching for these data type tags. For example, you can search for text containing a date that
falls within a certain [date range](../03-apis/api-v2/query-language.md#matching-standoff-dates).

Knora's APIs support automatic conversion between XML and Standoff/RDF. To make this work,
DSP-API supports automatic conversion between XML and Standoff/RDF. To make this work,
Standoff/RDF stores the order of tags and their hierarchical relationships. You must define an
[XML-to-Standoff Mapping](../03-apis/api-v2/xml-to-standoff-mapping.md) for your standoff tag classes and properties.
Then you can import an XML document into Knora, which will store it as Standoff/RDF. The text and markup
can then be searched using Gravsearch. When you retrieve the document, Knora converts it back to the
Then you can import an XML document into DSP-API, which will store it as Standoff/RDF. The text and markup
can then be searched using Gravsearch. When you retrieve the document, DSP-API converts it back to the
original XML.

To represent overlapping or non-hierarchical markup in exported and imported XML, Knora supports
[CLIX](http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html#t6) tags.
To represent overlapping or non-hierarchical markup in exported and imported XML, DSP-API supports
[CLIX](https://web.archive.org/web/20171222112655/http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html) tags.

Future plans for Standoff/RDF include:

- Creation and retrieval of standoff markup as such via the DSP-API,
without using XML as an input/output format.
- A user interface for editing standoff markup.
- The ability to create resources that cite particular standoff tags in other resources.
As XML-to-Standoff has proved to be complicated and not very well performing, the use of standoff with custom mappings is discouraged.
Improved integration of text with XML mark up, particularly TEI-XML, is in planning.

0 comments on commit 2548b8f

Please sign in to comment.