Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing unique identifier of the content relative to a site #75

Open
matuszeman opened this issue Jun 14, 2015 · 7 comments
Open

Providing unique identifier of the content relative to a site #75

matuszeman opened this issue Jun 14, 2015 · 7 comments

Comments

@matuszeman
Copy link

User story: As API user, I want to get unique ID of the content relative to a site.

Examples
https://www.youtube.com/watch?v=XOmwZopzcTA&index=9&list=PL53194065BA276ACA
ID: XOmwZopzcTA

https://soundcloud.com/jackedradio/afrojack-presents-jacked-radio-week-23
ID: 210140747

For a site where it's not possible to recognize system ID it would be based on a value from URL:
https://soundcloud.com/jackedradio/afrojack-presents-jacked-radio-week-23
ID: jackedradio/afrojack-presents-jacked-radio-week-23

What do you think?

@j0k3r
Copy link
Contributor

j0k3r commented Jun 15, 2015

I like that !

@iparamonau
Copy link
Member

Interesting. What would the use case for that feature be? Also, if we do it, say, for YouTube videos, what would should be returned for YouTube user profile pages?

... We have canonical URL in the response, I trust most people agreed to use it as identifier of the resource on the web, no?

@matuszeman
Copy link
Author

Yes, canonical URLs seemed to be just what I needed but as I think about this feature more, it could be probably renamed to "Providing unique identifier of PRIMARY content relative to a site".

Example:
https://www.youtube.com/watch?v=XOmwZopzcTA - represents a video page
https://www.youtube.com/watch?v=XOmwZopzcTA&list=PL53194065BA276ACA&index=9 - represents exactly same video page in a playlist.
I understand that both URLs above are just right as canonical URLs. Latter one is video in context of a playlist.

My use case is:
User provides an URL, my app should be able to check if "primary content" reference does exist in my DB or not.
Because of this, my idea was to use pair: site name and unique ID relative to that site.

@iparamonau
Copy link
Member

We tried to find a better answer to this use case for 3 years. It pops up every couple months in one form or another. No luck so far.

Here's to show you the problem. Even for YouTube, if we give you ID for the video, it will be the same for two URLs: ?v=... and ?v=...&t=... - a timed embed, which would have a different embed code. For Google Maps it would be zoom levels, etc.

That just shows you can not trust IDs. And even canonical addresses, as actual URL context is essential for embeds. Besides, for short links (say, Bitly), it will be faster if you just let Iframely complete the processing than returning a re-direct to your app. Facebook does cache by og:url or canonical, but it comes at a cost of slower processing times.

We ended up making a decision that caching by exact URL will cover 99% of our use cases, and that it is good enough for us. At least for now.

@matuszeman
Copy link
Author

That's actually what I'm after ... I want to be able to identify what content (uniquely identified per site) users share according an URL they provide. In my use case I don't care about zoom level nor time information - it just about identifying the primary content itself what in case of youtube video could be video ID or any unique identifier for such entity on the site.

I'm new to iframely, but I checked https://github.com/itteco/iframely/blob/master/plugins/domains/youtube.com/youtube.video.js
and it seems like it would be quite easy to provide this information from what we have already available there. Is there a documentation which I could use to learn more and maybe experiment a bit and contribute with a plugin?

@iparamonau
Copy link
Member

For stats aggregation - I see your point. As for caching it doesn't make sense: you would still need to make a request to Iframely to get this ID.

Even for stats, canonical would be a better and more universal source. You could take a hash of it for better indexing. With "canonical" I mean meta.canonical that is returned in Iframely JSON, or oembed.url, as ideally it is the same for same video. Not the actual URL you send to APIs.

Now the problem with our YouTube plugin we have is that it doesn't give canonical address at all. We will be fixing it soon as well as making sure all other plugins give consistent response.

If you experiment with it in the meantime, you could check this unfinished doc on how to write plugins.

@nleush
Copy link
Member

nleush commented Jun 16, 2015

@matuszeman

You can add

getMeta: function(...) {
    return {ID:'...'};
}

for any plugin.

And result data will contain 'ID' in 'meta' section of response from that plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants