schema.org defaults/coercions #3201
Replies: 9 comments 3 replies
-
I think what you are dealing with are persons who don't want to learn the structure part of structured data, they only want to provide the data. Since you are only interested in JSON-LD, I think it would be best to publish a set of script-builders that make it easier for the general public to create the minimal valid scripting for the various types. The script builders could live on the Rich Snippets guidance pages...' There are 32 types on the Rich Snippets guidance pages. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the reply!
The problem is: |
Beta Was this translation helpful? Give feedback.
-
Most of SEO "news" sites deal primarily with JSON-LD examples. I'd guess that there are a lot of TLDR types out there that don't understand that some data needs additional declared "Types".
I have not used the markup helper for years, but I just looked again, the markup helper will supply incomplete scripts.
Yes, that will happen Fortunately, the internet is not forever, contrary to popular belief. There is an interesting discussion here, Yoast is developing a more object-orientated approach to JSON-LD. Basically, they are using the @id on declarations (Organization, Place etc) and then that script can be used as a portion of a larger script (Event) by calling it with the @id or with https://schema.org/isPartOf. An overview here... For example the logo in the scripting is declared as an imageobject and also an image. I'm not sure if the above example will work across domains, but what if the authoritative website could provide always up to date JSON-LD snippets (Place) so that other data providers could re-use them? Say we have a musical artist that wants to provide data for her (in person) event. In the instance below, the "location" data could be provided by the venue, the "offer" information provided by the ticketing agency, and the organizer / performer data provided from another page on the artist's website.
If the venue or the ticketing agency has to make changes(pricing / sold out), these would (also) be updated within the artist's JSON-LD on her website. Again, not sure if this would work across domains, but instead of everybody having to retype data, we should have a way to get it from the authoritative website and let them keep it up to date and correct. |
Beta Was this translation helpful? Give feedback.
-
That is sort of an orthogonal conversation. Remote id resolution is a really interesting idea and I think in theory it's really cool. I'm not positive it's practical (because it sort of assumes somewhat static content and very rational actors in general) but I think we should start a new discussion if we want to go further. In a lot of the cases I'm talking about, we might not even have an entity as the target at all to go resolve externally. Internally in my discussions with Dan, I'm mostly focusing this issue on the problem with the ambiguity of the primitive value (a predicate that leads to an object OR a string/url). If I have an entity, especially if it's typed, there is much less confusion. Even Thing->name sort of implies a lot of semantics that there is an entity that has some descriptive string. But if I have just a Text at the end of a predicate, I could have anything there. I could have a completely separate encoding of the structure even ("I have a hex-encoded image there"). In multi-typed ranges, there isn't even a discussion of what the string should look like. So it's the ultimate example of both providers and consumers not being able to deal with it. Even a note in the documentation that said, "For Text strings, we expect the name of the Brand to be encoded" or something would be better than where schema.org is. Text strings that could be anything are essentially unparseable by any real standard or agreement. If we could solve/elaborate on the semantics of that, it would go a lot of the way towards my problem of how we interpret these things without assuming something schema.org doesn't explicitly say. |
Beta Was this translation helpful? Give feedback.
-
What is the "rich snippets testing tool"? We killed that thing a long time ago. We're several incarnations past that :) But I'm not even talking about "improperly typed" data necessarily. I mean how does someone interpret the semantics of a primitive string on a lot of these edges. http://schema.org/location can be a Place, PostalAddress, or a raw string. What is the raw string? It's even more unusable than a Thing with a name. At least that carries some restriction that it's a single entity with a descriptive name. |
Beta Was this translation helpful? Give feedback.
-
I guess it is called Rich Results test now. Looking at http://schema.org/location, that is pretty bizarre that a naked string is acceptable there. Maybe for things / locations that are really broad like "Pacific Ocean", but even so, place.name would be a more informative way to go about that. If you want to check the field as a string you might see if you get a hit as a match for a Wikidata item, conversely, if you think it is a one line address you might try querying Google Places to see if you get a match. But then what to you do? Store the data (as is) from the field, or store the data from what you think might be the authoritative database -- guessing the original provider's intent. |
Beta Was this translation helpful? Give feedback.
-
I am continuing to talk with Dan about ways to add semantics to the raw primitive values so they can be used more effectively. If anyone has interest in making these more specified (either by something soft like docs or more strict like Text => SometType -> name) please feel free to chime in. Otherwise, I will try to find some way to document what Google does during ingestion without making it look like a recommendation (since we usually would prefer a more semantic object notation). |
Beta Was this translation helpful? Give feedback.
-
Giving this a bump, at this point, it seems that creating a new Type of "undefined" might be useful for this. Not only for ingesting the data, but also for consuming it. |
Beta Was this translation helpful? Give feedback.
-
I don’t like the bare string component. I prefer an object, but a general
undefined object doesn’t seem to make sense. Is there an object for place
that is a planet? Something might be on Mars or the moon. I’m a fan of
adding more descriptive comments when a property is accepted and requiring
at least 2 examples. This sort of policy would really help new adopters and
beef up the current documentation.
On Thu, Nov 17, 2022 at 10:32 PM WeaverStever ***@***.***> wrote:
Giving this a bump, at this point, it seems that creating a new Type of
"undefined" might be useful for this. Not only for ingesting the data, but
also for consuming it.
—
Reply to this email directly, view it on GitHub
<#3201 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAJ2JQXC7O265NAQJPWS23WI4PH7ANCNFSM6AAAAAARRDYNHU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
--
All the best,
-Hugh
…Sent from my iPhone
|
Beta Was this translation helpful? Give feedback.
-
At Google, we do a lot to "fix" markup we find on the web so it complies with schema.org more fully and is easier to specify. This is evident in the schema.org validator tool that we currently host (http://validator.schema.org). For a simple example:
Go try that at validator.schema.org and you'll see what I mean. http://schema.org/author cannot be typed to Text currently so we try to turn it into a typed object. Which type is ambiguous so we end up creating a http://schema.org/Thing object. Historically the reason is that schema.org tried not to enforce it's typing system too strongly and so strings were always implicitly allowed. Additionally, Microdata and RDFa are error-prone to annotate complex structures. But this in turn means that all the consuming code if it wants to be most permissive needs to handle all variations of typed inputs.
At Google we either don't truly fix this like with http://schema.org/author currently (since we're converting it to an invalid type) and then complain on a per-feature basis in our Google-specific Rich Results Test OR we decide that it's pretty unambiguous which typed object is being suggested and internally force it. You can see this if you try this example:
What happened there? Now we unambiguously decided that the location was a Place, even though there's a number of types it could be. Note that in this case, the location here is actually valid markup because the range of location includes Text. But our code is built to handle objects there, so we need to force it to an object for downstream systems to interpret it.
This is annoying. I don't like doing this voodoo coercion magic, but it's the only way we can give a consistent experience to internal teams and in particular handle range expansions in schema.org. And a lot of the time, allowing primitive types on predicates in schema.org is a very useful convenience.
The crux of the issue is that there is no good way for us to specify these "default" behaviors in a standardized way. We could probably push on the JSON-LD standard to add some @context magic that would make it work, but then we have to leave Microdata and RDFa out of that and there's still very valid reasons to support those standards on the web. And they are the standards that benefit the most from primitive shorthands.
What I want is something that says http://schema.org/target -> http://schema.org/URL => http://schema.org/target -> http://schema.org/EntryPoint -> http://schema.org/urlTemplate or something to that effect. That's in essence what we do internally ourselves. (and this recent range expansion in schema.org 15 is what prompted this post).
I guess my open question is whether there is value in encoding this into schema.org itself so there is a common set of inference/defaults for interpreting primitive values in cases where a target of a predicate can also be a more expressive object. This type of thing would make it much easier to support simpler markup which is one of the main goals of schema.org while still allowing ranges to expand to more expressive objects. Our fallback option is just to publicly document these coercions we do but that does not feel like it solves the problem long term. Or I guess another option that I don't l ike would be to drop all these coercions/defaults completely and just don't consume things that are under-specified for the sake of correctness.
Beta Was this translation helpful? Give feedback.
All reactions