Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add command excel2json to create JSON project file from folder with Excel files (DEV-960) #248

Merged
merged 10 commits into from Nov 9, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions Makefile
Expand Up @@ -10,7 +10,7 @@ CURRENT_DIR := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))
.PHONY: dsp-stack
dsp-stack: ## clone the dsp-api git repository and run the dsp-stack
@mkdir -p .tmp
@git clone --branch main --single-branch --depth 1 https://github.com/dasch-swiss/dsp-api.git .tmp/dsp-stack
@git clone --branch v24.0.8 --single-branch https://github.com/dasch-swiss/dsp-api.git .tmp/dsp-stack
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary, I plan to revert this as soon as the other PR is merged that fixes the project IRIs. But as long as the API is not released, I don't want to merge the other PR, so that I could also make a release of dsp-tools before the API is released.

$(MAKE) -C .tmp/dsp-stack env-file
$(MAKE) -C .tmp/dsp-stack init-db-test
$(MAKE) -C .tmp/dsp-stack stack-up
Expand Down Expand Up @@ -51,7 +51,7 @@ install: ## install from source (runs setup.py)

.PHONY: test
test: dsp-stack ## run all tests located in the "test" folder (intended for local usage)
-pytest test/
-pytest test/ # ignore errors, continue anyway with stack-down
$(MAKE) stack-down

.PHONY: test-no-stack
Expand All @@ -60,7 +60,7 @@ test-no-stack: ## run all tests located in the "test" folder, without starting t

.PHONY: test-end-to-end
test-end-to-end: dsp-stack ## run e2e tests (intended for local usage)
-pytest test/e2e/
-pytest test/e2e/ # ignore errors, continue anyway with stack-down
$(MAKE) stack-down

.PHONY: test-end-to-end-ci
Expand Down
3 changes: 1 addition & 2 deletions docs/dsp-tools-create.md
Expand Up @@ -437,8 +437,7 @@ To do so, it would be necessary to place the following two files into the folder
![Colors_en](./assets/images/img-list-english-colors.png)
![Farben_de](./assets/images/img-list-german-colors.png)

The expected format of the Excel files is documented
[here](./dsp-tools-excel.md#create-the-lists-section-of-a-json-project-file-from-excel-files). The only difference to
The expected format of the Excel files is documented [here](./dsp-tools-excel.md#lists-section). The only difference to
the explanations there is that column A of the Excel worksheet is not interpreted as list name (root node), but as
node name of the first children level below the root node.

Expand Down
58 changes: 48 additions & 10 deletions docs/dsp-tools-excel.md
Expand Up @@ -3,20 +3,58 @@
# Excel files for data modelling and data import

dsp-tools is able to process Excel files and output the appropriate JSON or XML file. The JSON/XML file can then be
used to create the ontology on the DSP server or import data to the DSP repository. dsp-tools can also be used to
create a list from an Excel file.
used to create the ontology on the DSP server or import data to the DSP repository.




## JSON project file: "resources" section from Excel file
## JSON project file from Excel

With dsp-tools, a JSON project file can be created from Excel files. The command for this is documented
[here](./dsp-tools-usage.md#create-a-json-project-file-from-excel-files). A JSON project consists of different parts, and for
each of these parts, one or several Excel files are necessary. The Excel files and their format are described below.
It is possible to invoke a command for each of these parts separately (as described below). But it is more convenient to
use the command that creates the entire JSON project file. In order to do so, put all involved files into a folder with
the following structure:
```
data_model_templates
|-- lists
| |-- de.xlsx
| `-- en.xlsx
`-- onto_name (onto_label)
|-- properties.xlsx
`-- resources.xlsx
```

Conventions for the folder names:

- The "lists" folder must have exactly this name, if it exists. It can also be omitted.
- Replace "onto_name" by your ontology's name, and "onto_label" by your ontology's label.
- The only name that can be chosen freely is the name of the topmost folder ("data_model_files" in this example).

Then, use the following command:
```
dsp-tools excel2project data_model_files project.json
```

This will create a file `project.json` with the lists, properties, and resources from the Excel files.

Please note that the "header" of the resulting JSON file is empty and thus invalid. It is necessary to add the project
shortcode, name, description, keywords, etc. by hand.

Continue reading the following paragraphs to learn more about the expected structure of the Excel files.




### "resources" section

With dsp-tools, the `resources` section used in a data model (JSON) can be created from an Excel file. The command for
this is documented [here](./dsp-tools-usage.md#create-the-resources-section-of-a-json-project-file-from-an-excel-file).
Only `XLSX` files are allowed. The `resources` section can be inserted into the ontology file and then be uploaded onto
a DSP server.

**An Excel file template can be found [here](assets/templates/resources_template.xlsx). It is recommended to work from
**An Excel file template can be found [here](assets/data_model_templates/onto_name (onto_label)/resources.xlsx). It is recommended to work from
the template.**

The expected worksheets of the Excel file are:
Expand Down Expand Up @@ -51,14 +89,14 @@ For further information about resources, see [here](./dsp-tools-create-ontologie



## JSON project file: "properties" section from Excel file
### "properties" section

With dsp-tools, the `properties` section used in a data model (JSON) can be created from an Excel file. The command for
this is documented [here](./dsp-tools-usage.md#create-the-properties-section-of-a-json-project-file-from-an-excel-file).
Only the first worksheet of the Excel file is considered and only XLSX files are allowed. The `properties` section can
be inserted into the ontology file and then be uploaded onto a DSP server.

**An Excel file template can be found [here](assets/templates/properties_template.xlsx). It is recommended to work
**An Excel file template can be found [here](assets/data_model_templates/onto_name (onto_label)/properties.xlsx). It is recommended to work
from the template.**

The Excel sheet must have the following structure:
Expand All @@ -84,7 +122,7 @@ For further information about properties, see [here](./dsp-tools-create-ontologi



## JSON project file: "lists" section from Excel file(s)
### "lists" section

With dsp-tools, the "lists" section of a JSON project file can be created from one or several Excel files. The lists can
then be inserted into a JSON project file and uploaded to a DSP server. The command for this is documented
Expand Down Expand Up @@ -116,8 +154,8 @@ Some notes:
printed out if the list is not valid.

**It is recommended to work from the following templates:
[en.xlsx](assets/templates/lists/en.xlsx): File with the English labels
[de.xlsx](assets/templates/lists/de.xlsx): File with the German labels**
[en.xlsx](assets/data_model_templates/lists/en.xlsx): File with the English labels
[de.xlsx](assets/data_model_templates/lists/de.xlsx): File with the German labels**

The output of the above command, with the template files, is:

Expand Down Expand Up @@ -193,7 +231,7 @@ The output of the above command, with the template files, is:



## XML data file from Excel/CSV file
## XML data file from Excel/CSV

There are two use cases for a transformation from Excel/CSV to XML:

Expand Down
6 changes: 3 additions & 3 deletions docs/dsp-tools-excel2xml.md
Expand Up @@ -3,9 +3,9 @@
# `excel2xml`: Convert a data source to XML
dsp-tools assists you in converting a data source in CSV/XLS(X) format to an XML file.

| **Hint** |
|-------------------------------------------------------------------------------------------------------------------------------------------|
| This page is about the **module** `excel2xml`. The CLI command is documented [here](dsp-tools-excel.md#xml-data-file-from-excelcsv-file). |
| **Hint** |
|--------------------------------------------------------------------------------------------------------------------------------------|
| This page is about the **module** `excel2xml`. The CLI command is documented [here](dsp-tools-excel.md#xml-data-file-from-excelcsv). |

To demonstrate the usage of the `excel2xml` module, there is a GitHub repository named `0123-import-scripts`. It
contains:
Expand Down
81 changes: 41 additions & 40 deletions docs/dsp-tools-usage.md
Expand Up @@ -32,13 +32,13 @@ dsp-tools create [options] project_definition.json

The following options are available:

- `-s` | `--server` _server_: URL of the DSP server (default: 0.0.0.0:3333)
- `-u` | `--user` _username_: username used for authentication with the DSP API (default: root@example.com)
- `-p` | `--password` _password_: password used for authentication with the DSP API (default: test)
- `-V` | `--validate-only`: If set, only the validation of the JSON file is performed.
- `-l` | `--lists-only`: If set, only the lists are created. Please note that in this case the project must already exist.
- `-v` | `--verbose`: If set, more information about the progress is printed to the console.
- `-d` | `--dump`: If set, dump test files for DSP-API requests.
- `-s` | `--server` (optional, default: `0.0.0.0:3333`): URL of the DSP server
- `-u` | `--user` (optional, default: `root@example.com`): username used for authentication with the DSP API
- `-p` | `--password` (optional, default: `test`): password used for authentication with the DSP API
- `-V` | `--validate-only` (optional): If set, only the validation of the JSON file is performed.
- `-l` | `--lists-only` (optional): If set, only the lists are created. Please note that in this case the project must already exist.
- `-v` | `--verbose` (optional): If set, more information about the progress is printed to the console.
- `-d` | `--dump` (optional): If set, dump test files for DSP-API requests.

The command is used to read the definition of a project with its data model(s) (provided in a JSON file) and create it
on the DSP server. The following example shows how to upload the project defined in `project_definition.json` to the DSP
Expand All @@ -61,12 +61,12 @@ dsp-tools get [options] output_file.json

The following options are available:

- `-s` | `--server`: URL of the DSP server (default: 0.0.0.0:3333)
- `-u` | `--user`: username used for authentication with the DSP API (default: root@example.com)
- `-p` | `--password`: password used for authentication with the DSP API (default: test)
- `-P` | `--project`: shortcode, shortname or
[IRI](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) of the project (mandatory)
- `-v` | `--verbose`: If set, some information about the progress is printed to the console.
- `-s` | `--server` (optional, default: `0.0.0.0:3333`): URL of the DSP server
- `-u` | `--user` (optional, default: `root@example.com`): username used for authentication with the DSP API
- `-p` | `--password` (optional, default: `test`): password used for authentication with the DSP API
- `-P` | `--project` (mandatory): shortcode, shortname or
[IRI](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) of the project
- `-v` | `--verbose` (optional): If set, some information about the progress is printed to the console.

The command is used to get the definition of a project with its data model(s) from a DSP server and write it into a JSON
file. This JSON file can then be used to create the same project on another DSP server. The following example shows how
Expand Down Expand Up @@ -131,21 +131,34 @@ to use this file to replace internal IDs in an existing XML file to reference ex



## Create the "lists" section of a JSON project file from Excel files
## Create a JSON project file from Excel files

```
dsp-tools excel2project data_model_files project.json
```

The expected file and folder structures are described [here](./dsp-tools-excel.md#json-project-file-from-excel).




### Create the "lists" section of a JSON project file from Excel files

```bash
dsp-tools excel2lists folder output.json
dsp-tools excel2lists [options] folder output.json
```

Arguments:
- `folder` (optional, default: "lists"): folder with the Excel file(s)
- `output.json` (optional, default: "lists.json"): Output file
The following options are available:

- `-v` | `--verbose` (optional): If set, more information about the progress is printed to the console.

The expected Excel format is [documented here](./dsp-tools-excel.md#create-the-lists-section-of-a-json-project-file-from-excel-files).
The expected Excel format is [documented here](./dsp-tools-excel.md#lists-section).

**Tip: The command [`excel2project`](#create-a-json-project-file-from-excel-files) might be more convenient to use.**


## Create the "resources" section of a JSON project file from an Excel file

### Create the "resources" section of a JSON project file from an Excel file

```bash
dsp-tools excel2resources excel_file.xlsx output_file.json
Expand All @@ -154,20 +167,14 @@ dsp-tools excel2resources excel_file.xlsx output_file.json
The command is used to create the resources section of an ontology from an Excel file. Therefore, an Excel file has to
be provided with the data in the first worksheet of the Excel file.

The following example shows how to create the resources section from an Excel file called `Resources.xlsx`. The output
is written to a file called `resources.json`.

```bash
dsp-tools excel2resources Resources.xlsx resources.json
```
The expected Excel format is [documented here](./dsp-tools-excel.md#resources-section).

More information about the usage of this command can be
found [here](./dsp-tools-excel.md#create-the-resources-for-a-data-model-from-an-excel-file).
**Tip: The command [`excel2project`](#create-a-json-project-file-from-excel-files) might be more convenient to use.**




## Create the "properties" section of a JSON project file from an Excel file
### Create the "properties" section of a JSON project file from an Excel file

```bash
dsp-tools excel2properties excel_file.xlsx output_file.json
Expand All @@ -176,15 +183,9 @@ dsp-tools excel2properties excel_file.xlsx output_file.json
The command is used to create the properties section of an ontology from an Excel file. Therefore, an Excel file has to
be provided with the data in the first worksheet of the Excel file.

The following example shows how to create the properties section from an Excel file called `Properties.xlsx`. The output
is written to a file called `properties.json`.

```bash
dsp-tools excel2properties Properties.xlsx properties.json
```
The expected Excel format is [documented here](./dsp-tools-excel.md#properties-section).

More information about the usage of this command can be found
[here](./dsp-tools-excel.md#create-the-properties-for-a-data-model-from-an-excel-file).
**Tip: The command [`excel2project`](#create-a-json-project-file-from-excel-files) might be more convenient to use.**



Expand All @@ -195,9 +196,9 @@ dsp-tools excel2xml data-source.xlsx project_shortcode ontology_name

Arguments:

- data-source.xlsx: An Excel/CSV file that is structured according to [these requirements](dsp-tools-excel.md#cli-command-excel2xml)
- project_shortcode: The four-digit hexadecimal shortcode of the project
- ontology_name: the name of the ontology that the data belongs to
- data-source.xlsx (mandatory): An Excel/CSV file that is structured according to [these requirements](dsp-tools-excel.md#cli-command-excel2xml)
- project_shortcode (mandatory): The four-digit hexadecimal shortcode of the project
- ontology_name (mandatory): the name of the ontology that the data belongs to

If your data source is already structured according to the DSP specifications, but it is not in XML format yet, the
command `excel2xml` will transform it into XML. This is mostly used for DaSCH-interal data migration. There are no
Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Expand Up @@ -20,6 +20,8 @@ dsp-tools helps you with the following tasks:
a DSP server and writes it into a JSON file.
- [`dsp-tools xmlupload`](./dsp-tools-usage.md#upload-data-to-a-dsp-server) uploads data from an XML file (bulk
data import) and writes the mapping from internal IDs to IRIs into a local file.
- [`dsp-tools excel2project`](./dsp-tools-usage.md#create-a-json-project-file-from-excel-files) creates an entire JSON
project file from a folder with Excel files in it.
- [`dsp-tools excel2lists`](./dsp-tools-usage.md#create-the-lists-section-of-a-json-project-file-from-excel-files)
creates the "lists" section of a JSON project file from one or several Excel files. The resulting section can be
integrated into a JSON project file and then be uploaded to a DSP server with `dsp-tools create`.
Expand Down