Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posts with the same title aren't picked up #173

Open
WesleyAC opened this issue Dec 2, 2021 · 4 comments
Open

Posts with the same title aren't picked up #173

WesleyAC opened this issue Dec 2, 2021 · 4 comments

Comments

@WesleyAC
Copy link

WesleyAC commented Dec 2, 2021

I currently have three posts on https://notebook.wesleyac.com/atom.xml with the title "Recently". Only one of them was picked up by blaggregator.

Their <id> attributes are all different, as are their <link> and <updated> attributes — only the <title> and <author> attributes are the same.

@punchagan
Copy link
Member

I think blaggregator looks at the titles to avoid double announcing posts where the links were changed for posts, but I agree that this is a bug. Thanks for the bug report, I'll take a look.

@WesleyAC
Copy link
Author

WesleyAC commented Jan 2, 2022

Just wanted to mention that I ran into this again twice recently: once on my most recent "Recently" post on my notebook blog, and once on https://blog.wesleyac.com/posts/web3-centralized (a draft of which was previously inadvertently published at https://blog.wesleyac.com/posts/web3-is-centralized, and I changed the link hoping that would cause it to show up as new in peoples' RSS feeds)

@WesleyAC
Copy link
Author

@punchagan any chance this'll get fixed?

I notice that get_or_create_post in home/management/commands/crawlposts.py doesn't look at the post URL, seems like it might be as simple as changing:

post = Post.objects.filter(blog=blog, title=title).latest("posted_at")

to

post = Post.objects.filter(blog=blog, title=title, url=link).latest("posted_at")

or something similar. But I could be wrong there, that's just from reading the code.

@punchagan
Copy link
Member

I will take another look at this, @WesleyAC. Sorry for not getting back on this. I think the URL was excluded in the past since authors would often change URLs/titles of posts immediately after making a post or to handle cases where the domain of the blog has been changed.

But, I guess the frequency of crawls were increased from 10 minutes to an hour after that. And the number of "new" posts announced on Zulip have been limited to 2. So, I guess this shouldn't be as big a problem as it used to be, if at all it was. I'll spend some time on this soon, and deploy a fix. Apologies for the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants