Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
docs(ingest): Add accepted file formats to documentation (DEV-677) (#…
…2038)

* add accepted file formats

* Update data-formats.md

* update sipi path

* rename data-formats.md to file-formats.md

* update index.md

* fix footnote

* reset scala setting
  • Loading branch information
irinaschubert committed Apr 12, 2022
1 parent 521150f commit f72e7a0
Show file tree
Hide file tree
Showing 7 changed files with 34 additions and 36 deletions.
2 changes: 1 addition & 1 deletion .scalafmt.conf
@@ -1,6 +1,6 @@
version = "2.7.5"
maxColumn = 120
align.preset = most
align.preset = some
align.multiline = false
continuationIndent.defnSite = 2
assumeStandardLibraryStripMargin = true
Expand Down
22 changes: 0 additions & 22 deletions docs/01-introduction/data-formats.md

This file was deleted.

24 changes: 24 additions & 0 deletions docs/01-introduction/file-formats.md
@@ -0,0 +1,24 @@
<!---
* Copyright © 2021 - 2022 Swiss National Data and Service Center for the Humanities and/or DaSCH Service Platform contributors.
* SPDX-License-Identifier: Apache-2.0
-->

# File Formats in DSP-API

Currently, only a limited number of file formats is accepted to be uploaded onto DSP. Some metadata is extracted from the files during the ingest but the file formats are not validated. Only image file formats are currently migrated into another format. Both, the migrated version of the file and the original are kept.

The following table shows the accepted file formats:

| Category | Accepted format | Converted during ingest? |
| --------------------- | ------------------------- | -------------------------------------------------------------------------- |
| Text, XML<sup>1</sup> | TXT, XML, XSL, XSD | No |
| Tables | CSV, XLS, XLSX | No |
| 2D Images | JPEG, PNG, TIFF, JP2 | Yes, converted to JPEG 2000 by [Sipi](https://github.com/dasch-swiss/sipi) |
| Audio | MPEG (MP3), MP4, WAV | No |
| Video | MP4 | No |
| Office | PDF, DOC, DOCX, PPT, PPTX | No |
| Archives | ZIP, TAR, ISO, GZIP, 7Z | No |


1: If your XML files represent text with markup (e.g. [TEI/XML](http://www.tei-c.org/)),
the recommended approach is to allow Knora to store it as [Standoff/RDF](standoff-rdf.md).
2 changes: 1 addition & 1 deletion docs/01-introduction/index.md
Expand Up @@ -6,6 +6,6 @@
# Introduction

* [What Is DSP and DSP-API (previous Knora)?](what-is-knora.md)
* [Data Formats in DSP-API](data-formats.md)
* [File Formats in DSP-API](file-formats.md)
* [Standoff/RDF Text Markup](standoff-rdf.md)
* [An Example Project](example-project.md)
12 changes: 4 additions & 8 deletions docs/01-introduction/what-is-knora.md
Expand Up @@ -23,15 +23,11 @@ DSP solves this problem by keeping the data alive. You can query all the data
in a DSP repository, not just the metadata. You can import thousands of databases into
DSP, and run queries that search through all of them at once.

Another problem is that researchers use a multitude of different data formats, many of
Another problem is that researchers use a multitude of different file formats, many of
which are proprietary and quickly become obsolete. It is not practical to maintain
all the programs that were used to create and read old data files, or even
all the operating systems that these programs ran on.

Instead of preserving all these data formats, DSP supports
the conversion of all sorts of data to a [small number of formats](data-formats.md)
that are suitable for long-term preservation, and that maintain the data's meaning and
structure:
all the programs that were used to create and read old files, or even
all the operating systems that these programs ran on. Therefore, DSP only accepts a
certain number of [file formats](file-formats.md).

- Non-binary data is stored as
[RDF](http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/), in a dedicated
Expand Down
6 changes: 3 additions & 3 deletions docs/faq/index.md
Expand Up @@ -5,11 +5,11 @@

# Frequently Asked Questions

## Data Formats
## File Formats

### What data formats does Knora store?
### What file formats does Knora store?

See [Data Formats in Knora](../01-introduction/data-formats.md).
See [File Formats in Knora](../01-introduction/file-formats.md).

### Does Knora store XML files?

Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Expand Up @@ -10,7 +10,7 @@ nav:
- Introduction:
- Index: 01-introduction/index.md
- What is DSP?: 01-introduction/what-is-knora.md
- Data Formats in DSP-API: 01-introduction/data-formats.md
- File Formats in DSP-API: 01-introduction/file-formats.md
- Standoff/RDF Text Markup: 01-introduction/standoff-rdf.md
- An Example Project: 01-introduction/example-project.md
- DSP Ontologies:
Expand Down

0 comments on commit f72e7a0

Please sign in to comment.