Skip to content

Commit

Permalink
feat: import lists from excel (DSP-1341) (#48)
Browse files Browse the repository at this point in the history
* Removing GUI code

* Cleanup

* cleanup

* Added logging

* Bug fix XML parser not longer incremental

* Bugfixing...

* Bugfix

* Removed printouts

* Ongoing imrovements

* Fix: Problem after installing with pip

* Pimped up version to 0.9.12

* Added some fixes regarding knora-api: properties

* some small fixes

* Fixing test data

* Adding testing data

* versioning

* Docu update

* Bugfix for breaking change in dsp-api (concerning lists)

* Added support for lists defined in excel (1. step)

* ...

* Adding tests

* ...

* Test and a bit od docu

* Adapted setup.py to use openpyxl

* Adapted documentation to latestet development

* Added documentation

* excel-list node names from label

* Corss-references in documentation

* Documentation and small bugfix reading excel

* Updated version number and documentation

* Push version to 1.0.0

* The Big Cleanup

* Remove .DS_Store from everywhere

* type fix

* Cleanup pp

* chore(ci): bump ubuntu version

* chore(ci): fix dependencies

* chore(ci): fix dependencies

Co-authored-by: BalduinLandolt <balduin.landolt@hotmail.com>
Co-authored-by: Ivan Subotic <400790+subotic@users.noreply.github.com>
  • Loading branch information
3 people committed Feb 17, 2021
1 parent 03bfa82 commit 3628992
Show file tree
Hide file tree
Showing 75 changed files with 1,951 additions and 6,474 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Expand Up @@ -8,7 +8,7 @@ env:
jobs:
test-integration:
name: Integration Tests
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- name: Checkout source
uses: actions/checkout@v1
Expand All @@ -31,7 +31,7 @@ jobs:
with:
python-version: 3.9
- name: Install python package dependencies
run: sudo apt-get install libxml2-dev libxslt-dev python3-dev libgtk-3-dev libgstreamer1.0-0 gstreamer1.0-plugins-base freeglut3-dev libwebkitgtk-1.0-0 libjpeg-dev libpng-dev libtiff-dev libsdl-dev libnotify-dev libsm-dev
run: sudo apt-get install libxml2-dev libxslt-dev python3-dev libgstreamer1.0-0 gstreamer1.0-plugins-base freeglut3-dev libjpeg-dev libpng-dev libtiff-dev libsdl-dev libnotify-dev libsm-dev
- name: run test-integration
run: |
make upgrade-dist-tools
Expand Down
1 change: 1 addition & 0 deletions .gitignore
@@ -1,4 +1,5 @@
.tmp
**/.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
240 changes: 161 additions & 79 deletions docs/dsp-tools-create.md
@@ -1,17 +1,25 @@
# JSON ontology definition format
# JSON data model definition format

## Introduction
This document contains all the information you need to create an ontology that's used by DSP.

In the first section you find a rough overview of the ontology definition, all the necessary components with a
This document contains all the information you need to create an data model that's used by DSP. According to
Wikipedia, da [data model](https://en.wikipedia.org/wiki/Data_model) is "_is an abstract model that organizes elements
of data and standardizes how they relate to one another and to the properties of real-world entities._" Further it
states: "_A data model explicitly determines the structure of data. Data models are typically specified by a data
specialist, data librarian, or a digital humanities scholar in a data modeling notation_". In this section we will
describe one of the notations that is used by the _dsp-tools_ to create a data model in the dsp repository. The dsp
repository is loosely based on [Linked Open Data](https://en.wikipedia.org/wiki/Linked_data) where also the term
_"ontology"_ is used for the data model. It should be noted that in this context an ontology is not used in the
philosophical sense.

In the first section you find a rough overview of the data model definition, all the necessary components with a
definition and a short example of the definition.

## A short overview
In the following section, you find all the mentioned parts with a detailed explanation. Right at the beginning we look
at the basic fields that belong to an ontology definition. This serves as an overview for you to which you can return
at any time while you read the description.

A complete ontology definition looks like this:
A complete data model definition looks like this:

```json
{
Expand Down Expand Up @@ -147,56 +155,67 @@ as well e.i. "keywords": [].
`"lists": [<list-definition>,<list-definition>,...]`

Often in order to characterize or classify a real world object, we use a sequential or hierarchical list of terms. For
example a hypothetical classification of classical music genres could be as :

- Orchestral music
- Symphony
- Symphony poem
- Overture
- Concerto
- Ballet
- Incidential music
- Suite
- Chamber music
- String trio
- Piano trio
- String quartet
- Piano quartet
- String quintet
- Piano quintet
- Other
- Solo instrumental
- Organ
- Piano
- Harpsichord
- Spinet
- Guitar
- Lute
- Violin
- Flute
- Other
- Vocal Music
- Choir
- Oratorios
- Passions
- Cantatas
- Masses
- Motets
- Madrigals
- Psalms
- Solo
- Songs
- Arias
- Opera
- Comic opera
- Serious Opera
- Opera Semiseria
- Opera Conrnique
- Grand opera
- Opera verismo
example a classification of disciplines in the Humanities might look like follows:

- Performing arts
- Music
- Chamber music
- Church music
- Conducting
- Choirs
- Orchestras
- Music history
- Musictheory
- Musicology
- Jazz
- Pop/Rock
- Dance
- Choreography
- Theatre
- Acting
- Directing
- Playwriting
- Scenography
- Movies/Television
- Animation
- Live action
- Visual arts
- Fine arts
- Drawing
- Painting
- Photography
- Applied Arts
- Animation
- Architecture
- Decorative arts
- History
- Ancient history
- Modern history
- Languages and literature
- Linguistics
- Grammar
- Etymology
- Phonetics
- Semantics
- Literature
- Fiction
- Non-fiction
- Theory of literature
- Philosophy
- Aesthetics
- Applied philosophy
- Epistemology
- Justification
- Reasoning
- Metaphysics
- Determinism and free will
- Ontology
- Philosophy of mind
- Teleology


DSP allows to define such controlled vocabularies or thesauri. They can be arranged "flat" or in "hierarchies" (as the
given example about music genres is). The definition of these entities are called "lists" in the DSP. Thus, the
given example about the disciplines in Humanities is). The definition of these entities are called "lists" in the DSP. Thus, the
list object is used to give the resources of the ontology a taxonomic quality. A taxonomy makes it possible to
categorize a resource. The big advantage of a taxonomic structure as it is implemented by the DSP
is that the user can subcategorize the objects. This allows the user to formulate his search requests more or less
Expand Down Expand Up @@ -230,7 +249,8 @@ therefore flat.
A resource can be assigned to a taxonomic node within its properties. So a resource of type "musical work" with the
title "La Traviata" would have the property/attribute "musical-genre" with the value "Grand opera". Within the DSP,
each property or attribute has an assigned cardinality. Sometimes, a taxonomy allows that an object may belong to
different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases, a cardinality &gt; 1 allows to add multiple attributes
different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases,
a cardinality &gt; 1 allows to add multiple attributes
of the same time. See further below the description of the [cardinalities](#cardinalities)

A node of the Taxonomy may have the following elements:
Expand All @@ -243,50 +263,112 @@ It needs to specify at least one language.
is _optional_.
- _nodes_: Array of subnodes. If you have a non-hierarchical taxonomy (i.e. a taxonomy with only 2 levels, the root
level and another level), you don't have child nodes. Therefore the nodes element can be omitted in case of a flat
taxonomy.
taxonomy.

Each list must have exactely one root node which has the same form bu denotes the list itself.

Here is an example on how to build a taxonomic structure with the help of JSON:

```json
"lists": [
"lists": [
{
"name": "my_list",
"labels": {"en": "Disciplines of the Humanities"},
"comments": {"en": "This ist is just a silly example", "fr": "un example un peu fou"},
"nodes": [
{
"name": "classicalmusicgenres",
"labels": { "de": "Musikkategorien für klassische Musik", "en": "Genres of classical music" },
"name": "node_1_1",
"labels": {"en": "Performing arts"},
"comments": {"en": "Arts that are events", "de": "Künste mit performativem Character"},
"nodes": [
{
"name": "orchestral",
"labels": { "en": "Orchestral music", "de": "Orchestermusik" },
"comments": { "en": "Multiple instruments together", "de": "Mehrere Instrumente zusammen" },
{
"name": "node_2_2",
"labels": {"en": "Music"},
"nodes": [
{
"name": "symphony",
"labels": { "en": "Symphony", "de": "Symphonie" }
"name": "node_3_3",
"labels": {"en": "Chamber music"}
},
{
"name": "node_4_3",
"labels": {"en": "Church music"}
},
{
"name": "node_5_3",
"labels": {"en": "Conducting"},
"nodes": [
{
"name": "node_6_4",
"labels": {"en": "Choirs"}
},
{
"name": "node_7_4",
"labels": {"en": "Orchestras"}
}
]
},
{
"name": "symphonicpoem",
"labels": { "en": "Symphonic poem", "de": "Symphonische Dichtung" }
"name": "node_8_3",
"labels": { "en": "Music history" }
},
{
"name": "overture",
"labels": { "en": "Overture", "de": "Overtüre" }
"name": "node_9_3",
"labels": {"en": "Musictheory"}
},
{
"name": "concerto",
"labels": { "en": "Conerto", "de": "Konzert" }
"name": "node_10_3",
"labels": {"en": "Musicology"
},
...
{
"name": "node_11_3",
"labels": {"en": "Jazz"}
},
{
"name": "node_12_3",
"labels": {"en": "Pop/Rock/Blues"}
}
]
},
{
"name": "chambermusic",
"labels": { "en": "Chamber music", "de": "Kammermusik" },
"nodes": [...]
},
...
}
]
}
},
{...},{...}
]
}
]
```
#### Lists from Excel

A list can also be imported from an excel sheet. The excel must have the following format (currently only a single
language is supported):

![img_1.png](img_1.png)

In such a case, the excel-file can directly be referenced in the list definition by defining a special list node:
```json
{
"name": "fromexcel",
"labels": {
"en": "Fromexcel"
},
"nodes": {
"file": "excel-list.xlsx",
"worksheet": "Tabelle1"
}
}

```
The nodes section then must contain the fields

- _file_: Path to the excel file
- _worksheet_: The name of the worksheet in the excel

The nodenames are composed from the label by concatenating the words in the label, with the first word starting wit a
lower case character and the other words starting with an upper case character. So the label `Chamber music` would
become the name `chamberMusic`. _Please note that the label must be unqiue for one list. If in a hierarchical list the
same label is used several times, the nodename will be expanded by adding underlines "_" at the end until the name is
unique_.


As already mentioned before, the _lists_ element is optional. If there are no lists, this element has to be omitted.

### Groups
Expand Down
13 changes: 13 additions & 0 deletions docs/dsp-tools-excel.md
@@ -0,0 +1,13 @@
[![PyPI version](https://badge.fury.io/py/knora.svg)](https://badge.fury.io/py/knora)

# DSP tools to use Excel-files for data modelling and data import
Dsp-tools is able to directly read and process excel files and output the appropriate JSON and/or XML-files for
importing data to the dsp-repository.

## Flat and hierarchical lists
Lists or "controlled vocabularies" are sets of fixed terms that are used to characterize something. Hierarchical lists
correspond to classifications or taxonomies.

The format of the excel is described [here](./dsp-tools-create.md#lists-from-excel).


0 comments on commit 3628992

Please sign in to comment.