Skip to content

Latest commit

 

History

History
595 lines (528 loc) · 20.5 KB

README.md

File metadata and controls

595 lines (528 loc) · 20.5 KB

PyPI version

knora-py

knora-py is a python package containing a command line tool for data model (ontology) creation, a library allowing creation of single resources and mass upload using the bulk import of data into the Knora framework.

The package consists of:

  • Knora Python modules for accessing Knora using the API (ontology creation, data import/export etc.)
  • knora-create-ontology A command line program to create an ontology out of a simple JSON description
  • knora-reset-triplestore A command line program to reset the content of the ontology. Does not require a restart of the Knora-Stack.

Install

To install the latest version published on PyPI run:

$ pip3 install knora

To update to the latest version run:

$ pip3 install --upgrade knora

To install from source, i.e. this repository run:

$ python3 setup.py install

Creating an ontology with knora-create-ontology

This script reads a JSON file containing the data model (ontology) definition, connects to the Knora server and creates the data model.

Usage:

$ knora-create-ontology data_model_definition.json

It supports the foloowing options:

  • "-s server" | "--server server": The URl of the Knora server [default: localhost:3333]
  • "-u username" | "--user username": Username to log into Knora [default: root@example.com]
  • "-p password" | "--password password": The password for login to the Knora server [default: test]
  • "-v" | "--validate": If this flag is set, only the validation of the json is run
  • "-l" | "--lists": Only create the lists using simplyfied schema. Please note that in this case the project must exist.

JSON ontology definition format

The JSON file contains a first object an object with the prefixes for external ontologies that are being used, followed by the definition of the project wic h includes all resources and properties:

Prefixes

{
  "prefixes": {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "dcterms": "http://purl.org/dc/terms/"
  },
  "project": {},
  
}

Project data

The project definitions requires

  • "shortcode": A hexadecimal string in the range between "0000" and "FFFF" uniquely identifying the project.
  • "shortname": A short name (string)
  • a "longname": A longer string giving the full name for the project
  • descriptions: Strings describing the projects content. These descriptions can be supplied in several languages (currently "en", "de", "fr" and "it" are supported). The descriptions have to be given as JSON object with the language as key and the description as value. At least one description in one language is required.
  • keywords: An array of keywords describing the project.
  • lists: The definition of flat or hierarchical list (thesauri, controlled vocabularies)
  • ontology: The definition of the data model (ontology)

This a project definition lokks like follows:

"project": {
   "shortcode": "0809",
   "shortname": "test"
   "longname": "Test Example",
   "descriptions": {
     "en": "This is a simple example project with no value.",
     "de": "Dies ist ein einfaches, wertloses Beispielproject"
   }
   "keywords": ["example", "senseless"],
   "lists": [],
   "ontology": {}
}

Lists

A List consists of a root node identifing the list and an array of subnodes. Each subnode may contain again subnodes (hierarchical list). A node has the following elements:

  • name: Name of the node. Should be unique for the given list
  • labels: Language dependent labels
  • comments: language dependent comments (optional)
  • nodes: Array of subnodes (optional – leave out if there are no subnodes)

The lists object contains an array of lists. Here an example:

    "lists": [
      {
        "name": "orgtpye",
        "labels": { "de": "Organisationsart", "en": "Organization Type" },
        "nodes": [
          {
            "name": "business",
            "labels": { "en": "Commerce", "de": "Handel" },
            "comments": { "en": "no comment", "de": "kein Kommentar" },
            "nodes": [
              {
                "name": "transport",
                "labels": { "en": "Transportation", "de": "Transport" }
              },
              {
                "name": "finances",
                "labels": { "en": "Finances", "de": "Finanzen" }
              }
            ]
          },
          {
            "name": "society",
            "labels": { "en": "Society", "de": "Gesellschaft" }
          }
        ]
      }
    ]

the lists element is optional.

Ontology

The ontology object contains the definition of the data model. The ontology has the following elemens:

  • name: The name of the ontology. This has to be a CNAME conformant name that can be use as prefix!
  • label: Human readable and understandable name of the ontology
  • resources: Array defining the resources (entities) of the data model
    "ontology": {
      "name": "teimp",
      "label": "Test import ontology",
      "resources": []
    }

Resources

The resource classes are the primary entities of the data model. A resource class is a template for the representation of a real object that is represented in the DaSCh database. A resource class defines properties (aka data fields). For each of these properties a data type as well as the cardinality have to defined.

A resource consists of the following definitions:

  • name: A name for the resource

  • label: The string displayed of the resource is being accessed

  • super: A resource class is always derived from an other resource. The most generic resource class Knora offers is "Resource". The following parent predefined resources are provided by knora:

    • Resource: A generic "thing" that represents an item from the reral world
    • StillImageRepresentation: An object that is connected to a still image
    • TextRepresentation: An object that is connected to an (external) text (Not Yet Implemented)
    • AudioRepresentation: An object representing audio data (Not Yet Implemented)
    • DDDRepresentation: An object representing a 3d representation (Not Yet Implemented)
    • DocumentRepresentation: An object representing a opaque document (e.g. a PDF)
    • MovingImageRepresentation: An object representing a moving image (video, film)
    • Annotation: A predefined annotation object. It has the following properties defined:
      • hasComment (1-n), isAnnotationOf (1)
    • LinkObj: An resource class linking together several other, generic, resource classes. The class has the following properties: hasComment (1-n), hasLinkTo (1-n)
    • Region: Represents a simple region. The class has the following properties: hasColor (1), isRegionOf (1) hasGeometry (1), isRegionOf (1), hasComment (0-n)

    However, a resource my be derived from a resource class in another ontology within the same project or from another resource class in the same ontology. In this case the reference has to have the form prefix:resourceclassname.

  • labels: Language dependent, human readable names

  • comments: Language dependend comments (optional)

  • properties: Array of property definition for this resource class.

Example:

     "resources": [
        {
          "name": "person",
          "super": "Resource",
          "labels": { "en": "Person", "de": "Person" },
          "comments": {
            "en": "Represents a human being",
            "de": "Repräsentiert eine Person/Menschen"
          },
          "properties": []
        }

Properties

Properties are the definition of the data fields a resource class may or must have. The properties object has the following fields:

  • name: A name for the property
  • super: A property has to be derived from at least one base property. The most generic base property Knora offers is hasValue. In addition the property may by als a subproperty of properties defined in external ontologies. In this case the qualified name including the prefix has to be given. The following base properties are definied by Knora:
    • hasValue: This is the most generic base.
    • hasLinkTo: This value represents a link to another resource. You have to indicate the the "object" as a prefixed IRI that identifies the resource class this link points to.
    • hasColor: Defines a color value (ColorValue)
    • hasComment: Defines a "standard" comment
    • hasGeometry: Defines a geometry value (a JSON describing a polygon, circle or rectangle), see ColorValue
    • isPartOf: A special variant of hasLinkTo. It says that an instance of the given resource class is an integral part of another resource class. E.g. a "page" is a prt of a "book".
    • isRegionOf: A special variant of hasLinkTo. It means that the given resource class is a "region" of another resource class. This is typically used to describe regions of interest in images.
    • isAnnotationOf: A special variant of hasLinkTo. It denotes the given resource class as an annotation to another resource class.
    • seqnum: An integer that is used to define a sequence number in an ordered set of instances.
  • object: The "object" defines the type of the value that the property will store. The following object types are allowed:
    • TextValue: Represents a text that may contain standoff markup
    • ColorValue: A string in the form "#rrggbb" (standard web color format)
    • DateValue: represents a date. It is a string having the format "_calendar":"start":"end"
      • calender is either GREGORIAN or JULIAN
      • start has the form yyyy-mm-dd. If only the year is given, the precision is to the year, of only the year and month are given, the precision is to a month.
      • end is optional if the date represents a clearely defined period or uncertainty. In total, a DateValue has the following form: "GREGORIAN:1925:1927-03-22" which means antime in between 1925 and the 22nd March 1927.
    • DecimalValue: a number with decimal point
    • GeomValue: Represents a geometrical shape as JSON.
    • GeonameValue: Represents a location ID in geonames.org
    • IntValue: Represents an integer value
    • BooleanValue: Represents a Boolean ("true" or "false)
    • UriValue: : Represents an URI
    • IntervalValue: Represents a time-interval
    • ListValue: Represents a node of a (possibly hierarchical) list
  • labels: Language dependent, human readable names
  • gui_element: The gui_element is – strictly seen – not part of the data. It gives the generic GUI a hint about how the property should be presented to the used. Each gui_element may have associated gui_attributes which contain further hints. There are the following gui_elements available:
    • Colorpicker: The only GUI element for ColorValue. Let's You pick a color. It requires the attribute "ncolors=integer"
    • Date: The only GUI element for DateValue. A date picker gui. No attributes
    • Geometry: Not Yet Implemented.
    • Geonames: The only GUI element for GeonameValue. Interfaces with geonames.org and allows to select a location
    • Interval: Not Yet Implemented.
    • List: A list of values. The Attribute "hlist=" is mandatory!
    • Pulldown: A GUI element for ListValue. Pulldown for list values. Works also for hierarchical lists. The Attribute "hlist=" is mandatory!
    • Radio: A GUI element for ListValue. A set of radio buttons. The Attribute "hlist=" is mandatory!
    • SimpleText: A GUI element for TextValue. A simple text entry box (one line only). The attributes "maxlength=integer" and "size=integer" are optional.
    • Textarea: A GUI element for TextValue. Presents a multiline textentry box. Optional attributes are "cols=integer", "rows=integer", "width=percent" and "wrap=soft|hard".
    • Richtext: A GUI element for TextValue. Provides a richtext editor.
    • Searchbox: Must be used with hasLinkTo properties. Allows to search and enter a resource that the given resource should link to. The Attribute "numprops=integer" indicates how many properties of the found resources should be indicated. It's mandatory!
    • Slider: A GUI element for DecimalValue. Provides a slider to select a decimal value. The attributes "max=decimal" and "min=decimal" are mandatory!
    • Spinbox: A GUI element for IntegerValue. A text field with and "up"- and "down"-button for increment/decrement. The attributes "max=decimal" and "min=decimal" are optional.
    • Checkbox: A GUI element for BooleanValue.
    • Fileupload: not yet documented!
  • gui_attributes: See above
  • cardinality: The cardinality indicates how often a given property may occur. The possible values are:
    • "1": Exactly once (mandatory one value and only one)
    • "0-1": The value may be omitted, but can occur only once
    • "1-n": At least one value must be present. But multiple values may be present.
    • "0-n": The value may be omitted, but may also occur multiple times.

A complete example for a full ontology

{
  "prefixes": {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "dcterms": "http://purl.org/dc/terms/"
  },
  "project": {
    "shortcode": "0170",
    "shortname": "teimp",
    "longname": "Test Import",
    "descriptions": {
      "en": "This is a project for testing the creation of ontologies and data",
      "de": "Dies ist ein Projekt, um die Erstellung von Ontologien und Datenimport zu testen"
    },
    "keywords": ["test", "import"],
    "lists": [
      {
        "name": "orgtpye",
        "labels": {
          "de": "Roganisationsart",
          "en": "Organization Type"
        },
        "nodes": [
          {
            "name": "business",
            "labels": {
              "en": "Commerce",
              "de": "Handel"
            },
            "comments": {
              "en": "no comment",
              "de": "kein Kommentar"
            },
            "nodes": [
              {
                "name": "transport",
                "labels": {
                  "en": "Transportation",
                  "de": "Transport"
                }
              },
              {
                "name": "finances",
                "labels": {
                  "en": "Finances",
                  "de": "Finanzen"
                }
              }
            ]
          },
          {
            "name": "society",
            "labels": {
              "en": "Society",
              "de": "Gesellschaft"
            }
          }
        ]
      }
    ],
    "ontology": {
      "name": "teimp",
      "label": "Test import ontology",
      "resources": [
        {
          "name": "person",
          "super": "Resource",
          "labels": {
            "en": "Person",
            "de": "Person"
          },
          "comments": {
            "en": "Represents a human being",
            "de": "Repräsentiert eine Person/Menschen"
          },
          "properties": [
            {
              "name": "firstname",
              "super": ["hasValue", "foaf:givenName"],
              "object": "TextValue",
              "labels": {
                "en": "Firstname",
                "de": "Vorname"
              },
              "gui_element": "SimpleText",
              "gui_attributes": ["size=24", "maxlength=32"],
              "cardinality": "1"
            },
            {
              "name": "lastname",
              "super": ["hasValue", "foaf:familyName"],
              "object": "TextValue",
              "labels": {
                "en": "Lastname",
                "de": "Nachname"
              },
              "gui_element": "SimpleText",
              "gui_attributes": ["size=24", "maxlength=64"],
              "cardinality": "1"
            },
            {
              "name": "member",
              "super": ["hasLinkTo"],
              "object": "teimp:organization",
              "labels": {
                "en": "member of",
                "de": "Mitglied von"
              },
              "gui_element": "Searchbox",
              "cardinality": "0-n"
            }
          ]
        },
        {
          "name": "organization",
          "super": "Resource",
          "labels": {
            "en": "Organization",
            "de": "Organisation"
          },
          "comments": {
            "en": "Denotes an organizational unit",
            "de": "Eine Institution oder Trägerschaft"
          },
          "properties": [
            {
              "name": "name",
              "super": ["hasValue"],
              "object": "TextValue",
              "labels": {
                "en": "Name",
                "de": "Name"
              },
              "gui_element": "SimpleText",
              "gui_attributes": ["size=64", "maxlength=64"],
              "cardinality": "1-n"
            },
            {
              "name": "orgtype",
              "super": ["hasValue"],
              "object": "ListValue",
              "labels": {
                "en": "Organizationtype",
                "de": "Organisationstyp"
              },
              "comments": {
                "en": "Type of organization",
                "de": "Art der Organisation"
              },
              "gui_element": "Pulldown",
              "gui_attributes": ["hlist=orgtype"],
              "cardinality": "1-n"
            }
          ]
        }
      ]
    }
  }
}

JSON for lists

The JSON schema for uploading hierarchical lists only is simplyfied:

{
  "project": {
    "shortcode": "abcd",
    "lists": []
  }
}

The definition of the lists is the same as in the full upload of an ontology!

A full example for creating lists only

The following JSON definition assumes that there is a project with the shortcode 0808.

{
  "project": {
    "shortcode": "0808",
    "lists": [
      {
        "name": "test1",
        "labels": {
          "de": "TEST1"
        },
        "nodes": [
          {
            "name": "A",
            "labels": {
              "de": "_A_"
            }
          },
          {
            "name": "B",
            "labels": {
              "de": "_B_"
            },
            "nodes": [
              {
                "name": "BA",
                "labels": {
                  "de": "_BA_"
                }
              },
              {
                "name": "BB",
                "labels": {
                  "de": "_BB_"
                }
              }
            ]
          },
          {
            "name": "C",
            "labels": {
              "de": "_C_"
            }
          }
        ]
      }
    ]
  }
}

Reseting the triplestore with knora-reset-triplestore

This script reads a JSON file containing the data model (ontology) definition, connects to the Knora server and creates the data model.

Usage:

$ knora-reset-triplestore

It supports the following options:

  • "-s server" | "--server server": The URl of the Knora server [default: localhost:3333]
  • "-u username" | "--user username": Username to log into Knora [default: root@example.com]
  • "-p password" | "--password password": The password for login to the Knora server [default: test]

For resetting of the triplestore through Knora-API to work, it is necessary to start the Knora-API server with a configuration parameter allowing this operation (e.g., KNORA_WEBAPI_ALLOW_RELOAD_OVER_HTTP environment variable or the corresponding setting in application.conf).

Bulk data import

In order to make a bulk data import, a properly formatted XML file has to be created. The python module "knora" contains classes and methods to facilitate the creation of such a XML file.

Requirements

To install the requirements:

$ pip3 install -r requirements.txt

To generate a "requirements" file (usually requirements.txt), that you commit with your project, do:

$ pip3 freeze > requirements.txt

Publishing

Generate distribution package. Make sure you have the latest versions of setuptools and wheel installed:

$ python3 -m pip install --user --upgrade pip setuptools wheel
$ python3 setup.py sdist bdist_wheel

You can install the package locally from the dist:

$ python3 -m pip ./dist/some_name.whl

Upload package with twine,

first create ~/.pypirc:

[distutils] 
index-servers=pypi
[pypi] 
repository = https://upload.pypi.org/legacy/ 
username =your_username_on_pypi

then upload:

$ python3 -m pip install --user --upgrade tqdm twine
$ python3 -m twine upload dist/*

For local development:

$ python3 setup.py develop

Testing

$ pip3 install pytest
$ pip3 install --editable .
$ pytest