Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: import lists from excel (DSP-1341) #48

Merged
merged 40 commits into from Feb 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
092e93a
Removing GUI code
lrosenth Nov 11, 2020
1d76bb0
Cleanup
lrosenth Nov 13, 2020
f5b90ea
cleanup
lrosenth Nov 13, 2020
19f08f1
Added logging
lrosenth Nov 14, 2020
ece1fbf
Merge branch 'main' into DSP-1076-cleanup-api-logging
lrosenth Nov 14, 2020
9c04135
Bug fix XML parser not longer incremental
lrosenth Dec 7, 2020
57ea515
Bugfixing...
lrosenth Dec 7, 2020
e4f42ac
Bugfix
lrosenth Dec 7, 2020
5102d9a
Removed printouts
lrosenth Dec 7, 2020
3f2615f
Ongoing imrovements
lrosenth Jan 12, 2021
c4da149
Fix: Problem after installing with pip
BalduinLandolt Jan 12, 2021
d6f81a0
Pimped up version to 0.9.12
lrosenth Jan 13, 2021
4b1606b
Added some fixes regarding knora-api: properties
lrosenth Jan 26, 2021
dfbadb8
some small fixes
lrosenth Feb 8, 2021
d5dc256
Fixing test data
lrosenth Feb 8, 2021
935f559
Adding testing data
lrosenth Feb 8, 2021
6e4fb31
versioning
lrosenth Feb 9, 2021
953cc79
Docu update
lrosenth Feb 9, 2021
08ae568
Bugfix for breaking change in dsp-api (concerning lists)
lrosenth Feb 9, 2021
1fec0c9
Added support for lists defined in excel (1. step)
lrosenth Feb 11, 2021
fb8b674
...
lrosenth Feb 11, 2021
4a9a14c
Adding tests
lrosenth Feb 12, 2021
3e4702f
Merge branch 'main' into DSP-1341-lists-from-excel
lrosenth Feb 12, 2021
1e13016
...
lrosenth Feb 12, 2021
6ab324f
Test and a bit od docu
lrosenth Feb 12, 2021
df6e83f
Adapted setup.py to use openpyxl
lrosenth Feb 12, 2021
89448b4
Adapted documentation to latestet development
lrosenth Feb 15, 2021
c87535f
Added documentation
lrosenth Feb 15, 2021
4b6be38
excel-list node names from label
lrosenth Feb 15, 2021
c0a6cbb
Corss-references in documentation
lrosenth Feb 15, 2021
c975284
Documentation and small bugfix reading excel
lrosenth Feb 15, 2021
965724e
Updated version number and documentation
lrosenth Feb 15, 2021
c87352f
Push version to 1.0.0
lrosenth Feb 15, 2021
3857aca
The Big Cleanup
lrosenth Feb 16, 2021
5102ec2
Remove .DS_Store from everywhere
lrosenth Feb 16, 2021
5861963
type fix
lrosenth Feb 16, 2021
064eb5e
Cleanup pp
lrosenth Feb 16, 2021
16ccee2
chore(ci): bump ubuntu version
subotic Feb 16, 2021
92d6a43
chore(ci): fix dependencies
subotic Feb 16, 2021
0b1d82c
chore(ci): fix dependencies
subotic Feb 16, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Expand Up @@ -8,7 +8,7 @@ env:
jobs:
test-integration:
name: Integration Tests
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- name: Checkout source
uses: actions/checkout@v1
Expand All @@ -31,7 +31,7 @@ jobs:
with:
python-version: 3.9
- name: Install python package dependencies
run: sudo apt-get install libxml2-dev libxslt-dev python3-dev libgtk-3-dev libgstreamer1.0-0 gstreamer1.0-plugins-base freeglut3-dev libwebkitgtk-1.0-0 libjpeg-dev libpng-dev libtiff-dev libsdl-dev libnotify-dev libsm-dev
run: sudo apt-get install libxml2-dev libxslt-dev python3-dev libgstreamer1.0-0 gstreamer1.0-plugins-base freeglut3-dev libjpeg-dev libpng-dev libtiff-dev libsdl-dev libnotify-dev libsm-dev
- name: run test-integration
run: |
make upgrade-dist-tools
Expand Down
1 change: 1 addition & 0 deletions .gitignore
@@ -1,4 +1,5 @@
.tmp
**/.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
240 changes: 161 additions & 79 deletions docs/dsp-tools-create.md
@@ -1,17 +1,25 @@
# JSON ontology definition format
# JSON data model definition format

## Introduction
This document contains all the information you need to create an ontology that's used by DSP.

In the first section you find a rough overview of the ontology definition, all the necessary components with a
This document contains all the information you need to create an data model that's used by DSP. According to
Wikipedia, da [data model](https://en.wikipedia.org/wiki/Data_model) is "_is an abstract model that organizes elements
of data and standardizes how they relate to one another and to the properties of real-world entities._" Further it
states: "_A data model explicitly determines the structure of data. Data models are typically specified by a data
specialist, data librarian, or a digital humanities scholar in a data modeling notation_". In this section we will
describe one of the notations that is used by the _dsp-tools_ to create a data model in the dsp repository. The dsp
repository is loosely based on [Linked Open Data](https://en.wikipedia.org/wiki/Linked_data) where also the term
_"ontology"_ is used for the data model. It should be noted that in this context an ontology is not used in the
philosophical sense.

In the first section you find a rough overview of the data model definition, all the necessary components with a
definition and a short example of the definition.

## A short overview
In the following section, you find all the mentioned parts with a detailed explanation. Right at the beginning we look
at the basic fields that belong to an ontology definition. This serves as an overview for you to which you can return
at any time while you read the description.

A complete ontology definition looks like this:
A complete data model definition looks like this:

```json
{
Expand Down Expand Up @@ -147,56 +155,67 @@ as well e.i. "keywords": [].
`"lists": [<list-definition>,<list-definition>,...]`

Often in order to characterize or classify a real world object, we use a sequential or hierarchical list of terms. For
example a hypothetical classification of classical music genres could be as :

- Orchestral music
- Symphony
- Symphony poem
- Overture
- Concerto
- Ballet
- Incidential music
- Suite
- Chamber music
- String trio
- Piano trio
- String quartet
- Piano quartet
- String quintet
- Piano quintet
- Other
- Solo instrumental
- Organ
- Piano
- Harpsichord
- Spinet
- Guitar
- Lute
- Violin
- Flute
- Other
- Vocal Music
- Choir
- Oratorios
- Passions
- Cantatas
- Masses
- Motets
- Madrigals
- Psalms
- Solo
- Songs
- Arias
- Opera
- Comic opera
- Serious Opera
- Opera Semiseria
- Opera Conrnique
- Grand opera
- Opera verismo
example a classification of disciplines in the Humanities might look like follows:

- Performing arts
- Music
- Chamber music
- Church music
- Conducting
- Choirs
- Orchestras
- Music history
- Musictheory
- Musicology
- Jazz
- Pop/Rock
- Dance
- Choreography
- Theatre
- Acting
- Directing
- Playwriting
- Scenography
- Movies/Television
- Animation
- Live action
- Visual arts
- Fine arts
- Drawing
- Painting
- Photography
- Applied Arts
- Animation
- Architecture
- Decorative arts
- History
- Ancient history
- Modern history
- Languages and literature
- Linguistics
- Grammar
- Etymology
- Phonetics
- Semantics
- Literature
- Fiction
- Non-fiction
- Theory of literature
- Philosophy
- Aesthetics
- Applied philosophy
- Epistemology
- Justification
- Reasoning
- Metaphysics
- Determinism and free will
- Ontology
- Philosophy of mind
- Teleology


DSP allows to define such controlled vocabularies or thesauri. They can be arranged "flat" or in "hierarchies" (as the
given example about music genres is). The definition of these entities are called "lists" in the DSP. Thus, the
given example about the disciplines in Humanities is). The definition of these entities are called "lists" in the DSP. Thus, the
list object is used to give the resources of the ontology a taxonomic quality. A taxonomy makes it possible to
categorize a resource. The big advantage of a taxonomic structure as it is implemented by the DSP
is that the user can subcategorize the objects. This allows the user to formulate his search requests more or less
Expand Down Expand Up @@ -230,7 +249,8 @@ therefore flat.
A resource can be assigned to a taxonomic node within its properties. So a resource of type "musical work" with the
title "La Traviata" would have the property/attribute "musical-genre" with the value "Grand opera". Within the DSP,
each property or attribute has an assigned cardinality. Sometimes, a taxonomy allows that an object may belong to
different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases, a cardinality &gt; 1 allows to add multiple attributes
different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases,
a cardinality &gt; 1 allows to add multiple attributes
of the same time. See further below the description of the [cardinalities](#cardinalities)

A node of the Taxonomy may have the following elements:
Expand All @@ -243,50 +263,112 @@ It needs to specify at least one language.
is _optional_.
- _nodes_: Array of subnodes. If you have a non-hierarchical taxonomy (i.e. a taxonomy with only 2 levels, the root
level and another level), you don't have child nodes. Therefore the nodes element can be omitted in case of a flat
taxonomy.
taxonomy.

Each list must have exactely one root node which has the same form bu denotes the list itself.

Here is an example on how to build a taxonomic structure with the help of JSON:

```json
"lists": [
"lists": [
{
"name": "my_list",
"labels": {"en": "Disciplines of the Humanities"},
"comments": {"en": "This ist is just a silly example", "fr": "un example un peu fou"},
"nodes": [
{
"name": "classicalmusicgenres",
"labels": { "de": "Musikkategorien für klassische Musik", "en": "Genres of classical music" },
"name": "node_1_1",
"labels": {"en": "Performing arts"},
"comments": {"en": "Arts that are events", "de": "Künste mit performativem Character"},
"nodes": [
{
"name": "orchestral",
"labels": { "en": "Orchestral music", "de": "Orchestermusik" },
"comments": { "en": "Multiple instruments together", "de": "Mehrere Instrumente zusammen" },
{
"name": "node_2_2",
"labels": {"en": "Music"},
"nodes": [
{
"name": "symphony",
"labels": { "en": "Symphony", "de": "Symphonie" }
"name": "node_3_3",
"labels": {"en": "Chamber music"}
},
{
"name": "node_4_3",
"labels": {"en": "Church music"}
},
{
"name": "node_5_3",
"labels": {"en": "Conducting"},
"nodes": [
{
"name": "node_6_4",
"labels": {"en": "Choirs"}
},
{
"name": "node_7_4",
"labels": {"en": "Orchestras"}
}
]
},
{
"name": "symphonicpoem",
"labels": { "en": "Symphonic poem", "de": "Symphonische Dichtung" }
"name": "node_8_3",
"labels": { "en": "Music history" }
},
{
"name": "overture",
"labels": { "en": "Overture", "de": "Overtüre" }
"name": "node_9_3",
"labels": {"en": "Musictheory"}
},
{
"name": "concerto",
"labels": { "en": "Conerto", "de": "Konzert" }
"name": "node_10_3",
"labels": {"en": "Musicology"
},
...
{
"name": "node_11_3",
"labels": {"en": "Jazz"}
},
{
"name": "node_12_3",
"labels": {"en": "Pop/Rock/Blues"}
}
]
},
{
"name": "chambermusic",
"labels": { "en": "Chamber music", "de": "Kammermusik" },
"nodes": [...]
},
...
}
]
}
},
{...},{...}
]
}
]
```
#### Lists from Excel

A list can also be imported from an excel sheet. The excel must have the following format (currently only a single
language is supported):

![img_1.png](img_1.png)

In such a case, the excel-file can directly be referenced in the list definition by defining a special list node:
```json
{
"name": "fromexcel",
"labels": {
"en": "Fromexcel"
},
"nodes": {
"file": "excel-list.xlsx",
"worksheet": "Tabelle1"
}
}

```
The nodes section then must contain the fields

- _file_: Path to the excel file
- _worksheet_: The name of the worksheet in the excel

The nodenames are composed from the label by concatenating the words in the label, with the first word starting wit a
lower case character and the other words starting with an upper case character. So the label `Chamber music` would
become the name `chamberMusic`. _Please note that the label must be unqiue for one list. If in a hierarchical list the
same label is used several times, the nodename will be expanded by adding underlines "_" at the end until the name is
unique_.


As already mentioned before, the _lists_ element is optional. If there are no lists, this element has to be omitted.

### Groups
Expand Down
13 changes: 13 additions & 0 deletions docs/dsp-tools-excel.md
@@ -0,0 +1,13 @@
[![PyPI version](https://badge.fury.io/py/knora.svg)](https://badge.fury.io/py/knora)

# DSP tools to use Excel-files for data modelling and data import
Dsp-tools is able to directly read and process excel files and output the appropriate JSON and/or XML-files for
importing data to the dsp-repository.

## Flat and hierarchical lists
Lists or "controlled vocabularies" are sets of fixed terms that are used to characterize something. Hierarchical lists
correspond to classifications or taxonomies.

The format of the excel is described [here](./dsp-tools-create.md#lists-from-excel).