Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(ingest): Add accepted file formats to documentation (DEV-677) #2038

Merged
merged 8 commits into from Apr 12, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 0 additions & 22 deletions docs/01-introduction/data-formats.md

This file was deleted.

23 changes: 23 additions & 0 deletions docs/01-introduction/file-formats.md
@@ -0,0 +1,23 @@
<!---
* Copyright © 2021 - 2022 Swiss National Data and Service Center for the Humanities and/or DaSCH Service Platform contributors.
* SPDX-License-Identifier: Apache-2.0
-->

# File Formats in DSP-API

Currently, only a limited number of file formats is accepted to be uploaded onto DSP. Some metadata is extracted from the files during the ingest but the file formats are not validated. Only image file formats are currently migrated into another format. Both, the migrated version of the file and the original are kept.

The following table shows the accepted file formats:

| Category | Accepted format | Converted during ingest? |
| ------------- | ------------------------- | -------------------------------------------------------------------------- |
| Text, XML[^1] | TXT, XML, XSL, XSD | No |
| Tables | CSV, XLS, XLSX | No |
| 2D Images | JPEG, PNG, TIFF, JP2 | Yes, converted to JPEG 2000 by [Sipi](https://github.com/dasch-swiss/sipi) |
| Audio | MPEG (MP3), MP4, WAV | No |
| Video | MP4 | No |
| Office | PDF, DOC, DOCX, PPT, PPTX | No |
| Archives | ZIP, TAR, ISO, GZIP, 7Z | No |

[^1] If your XML files represent text with markup (e.g. [TEI/XML](http://www.tei-c.org/)),
the recommended approach is to allow Knora to store it as [Standoff/RDF](standoff-rdf.md).
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion docs/01-introduction/index.md
Expand Up @@ -6,6 +6,6 @@
# Introduction

* [What Is DSP and DSP-API (previous Knora)?](what-is-knora.md)
* [Data Formats in DSP-API](data-formats.md)
* [File Formats in DSP-API](file-formats.md)
* [Standoff/RDF Text Markup](standoff-rdf.md)
* [An Example Project](example-project.md)
12 changes: 4 additions & 8 deletions docs/01-introduction/what-is-knora.md
Expand Up @@ -23,15 +23,11 @@ DSP solves this problem by keeping the data alive. You can query all the data
in a DSP repository, not just the metadata. You can import thousands of databases into
DSP, and run queries that search through all of them at once.

Another problem is that researchers use a multitude of different data formats, many of
Another problem is that researchers use a multitude of different file formats, many of
which are proprietary and quickly become obsolete. It is not practical to maintain
all the programs that were used to create and read old data files, or even
all the operating systems that these programs ran on.

Instead of preserving all these data formats, DSP supports
the conversion of all sorts of data to a [small number of formats](data-formats.md)
that are suitable for long-term preservation, and that maintain the data's meaning and
structure:
all the programs that were used to create and read old files, or even
all the operating systems that these programs ran on. Therefore, DSP only accepts a
certain number of [file formats](file-formats.md).

- Non-binary data is stored as
[RDF](http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/), in a dedicated
Expand Down
4 changes: 2 additions & 2 deletions docs/faq/index.md
Expand Up @@ -7,9 +7,9 @@

## Data Formats
irinaschubert marked this conversation as resolved.
Show resolved Hide resolved

### What data formats does Knora store?
### What file formats does Knora store?

See [Data Formats in Knora](../01-introduction/data-formats.md).
See [File Formats in Knora](../01-introduction/file-formats.md).

### Does Knora store XML files?

Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Expand Up @@ -10,7 +10,7 @@ nav:
- Introduction:
- Index: 01-introduction/index.md
- What is DSP?: 01-introduction/what-is-knora.md
- Data Formats in DSP-API: 01-introduction/data-formats.md
- File Formats in DSP-API: 01-introduction/file-formats.md
- Standoff/RDF Text Markup: 01-introduction/standoff-rdf.md
- An Example Project: 01-introduction/example-project.md
- DSP Ontologies:
Expand Down