Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Json-LD (de)serializer #487

Closed
amedranogil opened this issue Mar 13, 2018 · 8 comments
Closed

Add Json-LD (de)serializer #487

amedranogil opened this issue Mar 13, 2018 · 8 comments
Assignees

Comments

@amedranogil
Copy link
Member

Add Json-LD https://www.w3.org/TR/json-ld/ Serializer.

prerequisite: resolve multi-serializer servcie, and service selection for serialization.

@amedranogil
Copy link
Member Author

How are we going to implement multiple instances of MessageContentSerializer? I'm guessing Filters will allow to select one or the other. But what do we put in the Filter?

@amedranogil
Copy link
Member Author

MessageContentSerializer.deserialize(String) should be defined as trowing a ParseException. This will be useful when having several serializers and determine which language the string is in.

@cstockloew
Copy link
Member

Sorry, there has been no progress on the serializer itself. But there is some progress in the preparation, e.g. the classes GraphIterator and Specializer. In the old version, the turtle serializer did everything by itself. The Specialization part is now provided by the Specializer in data.representation and can be used by different serializers (this was a big part). With Gson being a separate lib that is already included, adding JSON-LD should be a lot simper now.

A few things to consider (as far as I remember):

  • Format and Params:
    There should be a new class SerializerParams in
    data.representation/org.universAAL.middleware.serialization
    with
    • Definition of different formats: URIs as given in https://www.w3.org/ns/formats/ as public static final String FORMAT_XXX
    • Method SharedObjectParams getParams(String format) to get the params for a call to fetchSharedObject (hence, my work on the container modules)
      The different serializers are indeed distinguished by OSGi as part of the 'properties' parameter
  • Serializer options
    Different options to the serializer should be given as a separate 'options' parameter. See uaal_issues M7.
  • Serializer vs. SerializerEx
    The 'Ex' subinterface should be removed. It was only a workaround. This would be a possibility for a serializer option. See uaal_issues M7.
  • getPropSerializationType:
    Needs to be clarified. I'm waiting for an answer from Saied for a very long time now. Without further info on what this actually means, it is difficult to adapt this concept to different serialiers. See uaal_issues M8.
  • getPropSerializationFormat:
    There could be another method in Resource to allow for nested formats. Rarely needed and should mostly just return null, but this would allow, e.g. to add SPARQL as property of a Resource. One serializer should then call another serializer. Maybe this solves the issue with ParseException, if you know what format the "outer" serilaization is (which is always RDF for the buses and in most other cases can be queried in another way).

I know you are asking for this feature for a very long time now. Sorry for not having this finished yet.

@amedranogil
Copy link
Member Author

@cstockloew Thank you for your pointers, I am now starting the development of this Serializer; as you said, it seems to be easy.
Yet, I will need to understand some details that right now are a bit fuzzy for me (e.g: specialization, that as far as I understand it at this point is the last step of deserialization); as well as how to treat special types, such as type restrictions.
I will be using the current Serializer as template, thus it would be nice to have at least those ideas for all serializers implemented there (i.e: SerializerParams, the fetchSharedObject mechanism, the Serializer Options[this is also something I noticed, but I left it for another iteration])

@amedranogil
Copy link
Member Author

I just pushed new branch at https://github.com/universAAL/middleware/tree/issue/487
Only started with URICompactor, This funcionality could be cross-serializer but I found difficult to extract it from Turtle serializer (Plus it guesses human readable prefix names).

@cstockloew
Copy link
Member

The turtle serializer is stand-alone and thus does everything by itself. Maybe an external lib, like Gson, provides this already natively? If not, it would also be possible to "extract" some util-methods, either to data.representation or to another bundle, e.g. data.serialization.common.{core/osgi}?

The serializer works like this:
it first analyses all Resources (method analyzeResource) and counts all namespaces (among others) with

			if (StringUtils.isQualifiedName(uri))
				countNs(uri.substring(0, uri.lastIndexOf('#') + 1), nsTable);

The Hashtable nsTable maps the namespace (String) to the number of how often it is used (integer).

Later, when it comes to actually writing the output, it calls writeNamespaces(nsTable) which writes the namespaces to the output String and makes the overall mapping in HashTable namespaceTable. This namespaceTable is then used for every Resource that is written (method writeURI).

@amedranogil
Copy link
Member Author

This function is exactly what URICompactor is doing, but with 2 added features:

  1. it does not depend on the character ´#´; reading turtle it seems prefixes can also end in ´/´ or other non-alphanumeric value.
  2. the compacted prefix is guessed from the URI, not by arbitrary order of processing.

@amedranogil
Copy link
Member Author

I have developed other analyzers,
for example one that counts the blank nodes as to pad according to the total number of BNs. There is also a serializationType Analyzer which has the double function of counting the references (specially handy to determine if a resource should be embedded or not) and which type of serialization it has (essentially condensing the whole serialization policy, see #496 ). There is an Resource analysis framework which should be reusable, it is based on the GraphIterator Class (which pressents some issues with literals,as they are included in the analysis, maybe there should a subclass to avoid iterating through literal Resources)

As sanity check, allow me to enumerate the common things and/or "uAAL quirks" a serializer has to account for:

  • serialization type (Serialization types #496 )
  • Resources being marked as literals (not yet accounted for in JSON-LD, not sure either at the moment how to, part of the reason is the GraphIterator, the other is that there is no mention on the specs)
  • compacting URIs (probably more complex due to the two points above)
  • anonymous resource (switching from internal URI to_:BN)
  • Lists, concretely closedCollections
  • Class Types, currently the JSON serializer adds all in r.getTypes(), probably should filter abstract clases?

All other stuff is just listing properties and serializing them recursively. Of course theres the help of TypeMapper.getXMLInstance(o), which helps serialize primitives in XML (most other serializations follow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants