Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out mf2 h-feed authorship #195

Open
alexmingoia opened this issue Apr 6, 2020 · 5 comments
Open

Figure out mf2 h-feed authorship #195

alexmingoia opened this issue Apr 6, 2020 · 5 comments

Comments

@alexmingoia
Copy link

alexmingoia commented Apr 6, 2020

Source: HTML
Target: Atom/XML
Example: https://granary.io/url?input=html&output=atom&url=https://news.indieweb.org/en
Expected feed author: IndieNews en @ https://news.indieweb.org/en
Actual feed author: The first h-card on the page
Note: The feed id and title is correct, but the <author> element is not.

Suggested solution: Granary should follow the representative-h-card-parsing algorithm, and if no h-card is found then use <title> and page URL as the author, instead of incorrectly assuming the first h-card is the page's author.

@snarfed
Copy link
Owner

snarfed commented Apr 6, 2020

thanks for filing! granary currently uses the authorship algorithm to find the feed author, but that's evidently for posts, not feeds. so i guess you're right, maybe i should use the h-feed's p-author mf2 property first, and if not provided, fall back to representative h-card.

@snarfed
Copy link
Owner

snarfed commented Apr 10, 2020

lots more discussion on this recently on #indieweb-dev and on #microformats, but no conclusion. basically, we don't yet have an "authoritative" way to determine an h-feed's author, at least if it doesn't have an explicit p-author property. representative h-card and authorship algorithm are both related, but neither is the exact answer.

@tantek's comments here are perhaps the closest thing to a conclusion: basically, we still need to do some research and come up with an algorithm. we don't necessarily have the "right" one just yet.

snarfed, h-feed authorship is an interesting problem and worth researching & brainstorming properly rather than seeing if h-entry approaches “just work” because that may be overdoing it
Better to collect examples (links, analysis) of h-feed elements that you’re trying to parse and analyze them to figure out a minimum algorithm based on examples
The “XML approach” would be to assume / require authors/publishers always use an author property and then “just” look for that. While a good starting point, it’s obviously a bad approach to optimize for developer convenience rather than researching reasonable real world examples and making sure to handle them
It’s also a bad approach to “just try” some other similar algorithm to see if it “just works” as you’re likely making all sorts o bad assumptions by doing so
So I disagree with both “just use representative h-card” and “just use h-entry authorship but for h-feed”
There’s no shortcut here. If you want a good algorithm it has to start with documenting & analyzing real world publishing examples

@snarfed
Copy link
Owner

snarfed commented Apr 10, 2020

i'm not necessarily going to take on researching and creating this new h-feed authorship algorithm, but i will take two todos here:

@alexmingoia
Copy link
Author

alexmingoia commented Apr 11, 2020

Here is the algorithm I am using to parse feed author in the wild, quoted from indieweb/authorship/issues/4:

  1. If h-feed with p-author, author is p-author.
  2. If h-feed with u-url, and that URL has h-card matching u-url, author is that h-card.
  3. If h-feed with u-url, and that URL has no h-card matching u-url, author URL is u-url and name is page <title>.
  4. If h-feed with no u-url or p-author, author URL is page URL and name is page <title>.
  5. If no h-feed then no feed author.

This would at least fix the example feed parsing for this issue, setting the author to be "IndieNews en @ news.indieweb.org/en"

snarfed added a commit that referenced this issue Apr 19, 2020
@snarfed
Copy link
Owner

snarfed commented Apr 19, 2020

i've taken a stab at this in 8e190da, but it's an ugly refactoring and nowhere near usable yet, and i don't see a clear path to get it merged. open to other thoughts or attempts!

@snarfed snarfed changed the title Incorrect author identification for HTML source Figure out mf2 h-feed authorship Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants