Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is Schema Salad about? #540

Open
bblfish opened this issue Jun 2, 2022 · 7 comments
Open

What is Schema Salad about? #540

bblfish opened this issue Jun 2, 2022 · 7 comments

Comments

@bblfish
Copy link
Contributor

bblfish commented Jun 2, 2022

Hi,

I am not quite clear looking at the documentation what Schema Salad is about.
My guess is that it could provide what CSV for the Web W3C standard provides for CSV files. See the examples in the csv2rdf document.

Something like that would allow BigData engineers to work with Avro binary files, but easily be able to find out what the URL for a specific relation or type is, or even how to construct the subject url if available. That would not add any complexity to the Avro encoding but would make it able to work with data from many other areas and also make it easy to find definitions by using HTTP on the relation or Class is.

Am I guessing correctly that this is the main use case for SALAD?

@bblfish
Copy link
Contributor Author

bblfish commented Jun 3, 2022

There is a discussion on the semantic-web list starting from json-ld, to yaml-ld to avro-ld. "GRDDL for BigData..."
https://lists.w3.org/Archives/Public/semantic-web/2022Jun/

@tetron
Copy link
Member

tetron commented Jun 3, 2022

Salad is a schema language that ties Avro schema together with linked data in order to emit an Avro schema, json-ld context, and RDFS. It also makes Avro a bit easier to use by adding inheritance and template specialization. @VladimirAlexiev calls this "polyglot modeling".

For idiosyncratic historical reasons, Salad has mostly only been used to describe schemas for JSON and YAML files, but since it is built on Avro, you could use it for Avro binary files as well.

Parts of this discussion might be helpful:

json-ld/yaml-ld#3

@bblfish
Copy link
Contributor Author

bblfish commented Jun 6, 2022

Thanks @tetron for the help.

To help me make sure I understood I developed a schema and model using Salad yaml
and sent a mail there explaining how it all worked:
https://lists.w3.org/Archives/Public/semantic-web/2022Jun/0011.html

I did not use inheritance, but that also looks very helpful.

So now I understand that Salad allows one to write Avro schemas in yaml and mark them up with RDF.
The schema-salad-tool allows one to produce json-ld contexts that one can then add to the json representation of avro data in order to produce RDF. Because there is an isomorphism between avro-json and avro-binary one can think of the binary as also containing a json-ld context.

Before that we had a discussion on the semantic web mailing list about looking at the binary data as if it were json-ld data. Of course one may then want to just interpret the avro data directly without going through json-ld. @ericprud wrote up an initial idea here, which I need to go through too
https://lists.w3.org/Archives/Public/semantic-web/2022Jun/0009.html

If I can summarise what I learnt and why it took me a bit of time to understand:

  • I was not familiar with yaml and did not realise it was an extension of json
    you could make that a lot clearer by adding some JS to your spec so that both views are possible
    The OWL specs offer that https://www.w3.org/TR/2012/REC-owl2-primer-20121211/ (but don't offer a URL shortcut to one of the selections %☠️$! )
  • Because most Avro data is in binary format I was thinking of it mostly that way
    • and yet there is nothing in the documentation about the avro binary format, so that confused me
    • it takes a bit of exercise to think of Avro as a more general schema language than required for the
      binary format
    • the json-ld context only works with the json avro data, and can be thought of as working with the binary avro data, but that is one layer of indirection, and it is not efficient. For big data situations that needs to be thought through, perhaps a note needs to be written up that the tools for working directly with binary data need to be developed

Thinking about this I was wondering how different in expressivity Avro is from Shacl or ShEx . Could one perhaps not just use Shacl directly to describe Avro binary data? What would be missing?

@bblfish
Copy link
Contributor Author

bblfish commented Jun 6, 2022

Btw. I came acrross the EU FairPlus project's description of their use of Salad for writing Forms §9.4.2F Metadata profile validation in RDF, and ShEx to validate them.

@rob-metalinkage
Copy link

How does this relate to JSON-LD-framing ?

@mr-c
Copy link
Member

mr-c commented Jun 20, 2022

How does this relate to JSON-LD-framing ?

Based upon https://json-ld.org/spec/latest/json-ld-framing/#introduction I would say: Neither schema-salad itself (nor schema-salad documents) specify deterministic layouts.

@tetron
Copy link
Member

tetron commented Jun 20, 2022

There's a defacto normalized form for a schema-salad document but if you are starting with an RDF graph there's likely to be multiple valid schema-salad json serializations (this depends on your schema) and schema salad doesn't have features to say which one is the preferred one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants