Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing proxy images #4722

Merged
merged 27 commits into from
May 22, 2024
Merged

Fixing proxy images #4722

merged 27 commits into from
May 22, 2024

Conversation

dessalines
Copy link
Member

@dessalines dessalines commented May 15, 2024

I took #4704 , and ripped out the new image detail portions, so this only contains fixes for the image proxying.

api_tests/package.json Show resolved Hide resolved
api_tests/src/image.spec.ts Outdated Show resolved Hide resolved
api_tests/src/image.spec.ts Show resolved Hide resolved
api_tests/src/post.spec.ts Outdated Show resolved Hide resolved
api_tests/src/shared.ts Outdated Show resolved Hide resolved
api_tests/src/shared.ts Outdated Show resolved Hide resolved
crates/api_common/src/request.rs Outdated Show resolved Hide resolved
crates/api_common/src/request.rs Show resolved Hide resolved
crates/api_common/src/request.rs Outdated Show resolved Hide resolved
crates/api_common/src/request.rs Outdated Show resolved Hide resolved
@dessalines dessalines enabled auto-merge (squash) May 16, 2024 20:44
@dessalines dessalines disabled auto-merge May 16, 2024 20:44
@dessalines
Copy link
Member Author

Should probably get a review from either @Nutomic or @phiresky before merging this one.

protocol_and_hostname,
encode(link.as_str())
))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put this in a function when its only called once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmk, I can move it into the function.

post: Post,
custom_thumbnail: Option<Url>,
federated_thumbnail: Option<Url>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont remove this, as Lemmy would generate a lot of extra local thumbnails when it could simply reuse the thumbnail of the original instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasons I removed it, had to do with complex cases w/ respect to proxying:

  • The post.url is the source URL or picture link, it should always remain unproxied / unchanged.
  • The thumbnail_url could be a pictrs proxied picture, a local picture, or an external picture.
  • Federating that thumbnail_url could be dangerous, because it could be a proxied link already. If we have proxying turned on for our local instance, then the URL would be double-proxied.

I could add back in federated_thumbnail , but we'd have to ignore it if proxying is turned on for our instance, and just use the post.url to generate a thumbnail locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing "dangerous" about double-proxying an image. In fact its better because it prevents small sites from being overloaded if lots of instances generate thumbnails for it.

Im also not clear why the post.url should always remain unproxied. If its an image its better to request it through the local instance, to reduce bandwidth usage on the original site and avoid ip leaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im also not clear why the post.url should always remain unproxied.

  1. its pointless, since images should be using the thumbnail_url field anyway. In fixing this, I also ran into a bug where even non-picture link URLs were getting proxied.
  2. if you chain proxy links, its going to get messy really quickly. A picture shouldn't need to get proxied multiple times and put that much load on pictrs servers. Not to mention the speed issues. Why not just leave the post.url alone, and have the derived images be generated from it.

I could add back in federated_thumbnail, but ignore it and regenarate it if we have proxying turned on. That's a small one-time cost that doesn't chain and put load and speed issues on a lot of servers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright lets leave it like this and maybe adjust things after its in production.

@Nutomic Nutomic merged commit 55f84dd into main May 22, 2024
2 checks passed
@dessalines
Copy link
Member Author

Cool, it def needs tested on the federated test instances anyway.

@kroese
Copy link
Contributor

kroese commented May 23, 2024

This pull breaks the previous fix for federated thumbnails again..

It overwrites the federated thumbnail with an external URL to the image (from the metadata).

So a federated post from lemmy.ml for example, which should have a thumbnail from lemmy.ml, now directly links to the large image on the external website after this pull. This causes federated posts to have a different thumbnail than the original post. This becomes much more apparent when ImageMode is set to None ofcourse, because otherwise the external image is stored/resized locally.

This is not how it worked in all previous Lemmy versions, it should not modify the thumbnail_url at all when ImageMode is set to None.

@dessalines
Copy link
Member Author

Could you do a test case demonstrating what's broken using ds9.lemmy.ml and voyager.lemmy.ml ? I just fixed and added integration tests for these outlined in this issue: #4736

There is only a post.url and a post.thumbnail_url , the 2nd being generated from the first based on your instance settings.

Front ends should be using the post.thumbnail_url to display images, and can use whatever resizing params they like.

@dessalines
Copy link
Member Author

Those are old versions with neither of these two PR fixes included.

These need to be tested with the fix I deployed just today: 0.19.4-rc.2 . You can use ds9.lemmy.ml and voyager.lemmy.ml to test these cases.

@kroese
Copy link
Contributor

kroese commented May 23, 2024

They were created with 0.19.4-rc.1 less than a day ago. So at least there was a regression recently.

I just updated to 0.19.4-rc.2 as we speak, and I will see if it makes a difference.

@kroese
Copy link
Contributor

kroese commented May 23, 2024

Same problem still exists with 0.19.4-rc.2..

This post was federated 14 minutes ago:

https://feddit.nl/post/15623267 with thumbnail url: https://feddit.nl/pictrs/image/2459741d-d58b-4299-8f14-2c17e7e25931.jpeg?format=webp&thumbnail=256

On my server with rc2 the thumbnail url becomes: https://cdn.nos.nl/image/2024/05/23/1086032/1024x576a.jpg

So the behaviour I saw in rc1 is unchanged in rc2.

@dessalines
Copy link
Member Author

Your pictrs image_mode is probably set to none, which means it should use the original image metadata link, which looks correct there. Don't you want proxying on?

@kroese
Copy link
Contributor

kroese commented May 23, 2024

No? I have always had it set to None, back from the days when it was called cache_thumbnails=false.

My issue is that in version 0.18 and up until recently, the behaviour of None was to use the federated thumbnail URL (and not the URL from the fetched metadata).

Since the original image can be several MB in size, it makes more sense to link to the thumbnail generated by the federating instance.

If there is a good rationale to change the behaviour for this setting, then I am interested in that. But I am pretty sure this change was not intended. And a few beta's back the same thing happened, and it was patched after I reported it, so that seems another confirmation that this new behaviour is not intended.

EDIT: Here is the previous time I reported the same issue: #4604 (comment) and it was fixed by Nutomic afterwards and now the same issue is back.

@dessalines
Copy link
Member Author

dessalines commented May 23, 2024

Yes this change is intentional, check out the docs here: https://github.com/LemmyNet/lemmy/blob/main/config/defaults.hjson#L52

Ignore federation for a moment, and think about a local instance.

mode action
None Fetch the metadata, grab the thumbnail_url from it, and use that url directly.
Store Fetch the metadata, and if an image exists, store it locally in pictrs.
Proxy Fetch the metadata, and use pictrs's new image proxying function.

With federation, many instances won't be storing their images locally. It seems like your use case, is that you want proxying.

@kroese
Copy link
Contributor

kroese commented May 23, 2024

Yes, for local posts it makes perfect sense to take the URL from the metadata when image_mode=none, there is no other way. And these docs seem to refer to what happens for local posts.

But for federated posts, it never took the image URL from the metadata (for every version before 0.19.beta-3 that is), it just used the thumbnail_url that was federated.

And that old behaviour made perfect sense, because there is no garantuee that thumbnail_url and the image from the metadata are equal (even though under most circumstances they will be, but with custom thumbnails they are not).

And it also causes the overviews to load much slower, because instead of loading a 256x256 thumbnail you are now loading possibly very large images. A third problem is that when the external URL is broken, you loose the thumbnails, which would not happen if you linked to the image on the federating server. Also visitors loose some privacy, because instead of making connections to lemmy.ml and lemmy.world, they are now making connections to random URL's.

So that are at least 4 downsides, and I cannot think of any advantage, except maybe that the load on the origin instance is reduced (needs to serve less traffic).

But if this change was by design, then I will just accept it. I never saw it discussed anywhere and I liked the old behaviour more.

And Im pretty sure it breaks the newly introduced "custom thumbnail" feature as the value of thumbnail_url gets totally ignored.

@dessalines
Copy link
Member Author

But for federated posts, it never took the image URL from the metadata (for every version before 0.19.beta-3 that is), it just used the thumbnail_url that was federated.

That is dangerous now due to proxying. Otherwise you'll get chained links, putting load on many servers. I discussed these problems in the issues and prs above.

And it also causes the overviews to load much slower, because instead of loading a 256x256 thumbnail you are now loading possibly very large images.

If you have image mode to none, then you are relying on other servers to store that data for you, which they might not be. This is exactly what the proxy mode is there for now.

Also visitors loose some privacy, because instead of making connections to lemmy.ml and lemmy.world, they are now making connections to random URL's.

That's what the proxy mode is for.

And Im pretty sure it breaks the newly introduced "custom thumbnail" feature as the value of thumbnail_url gets totally ignored.

We've accounted for this locally, but federation might still need to be tested there.

Essentially what I can gather, is that before, you were relying on a pseudo proxy mode, where if other servers were storing images, then federated post thumbnails could work. That's not necessary now since pictrs added proper proxying.

@SleeplessOne1917 SleeplessOne1917 deleted the fix_proxy_images_1 branch May 23, 2024 23:57
@kroese
Copy link
Contributor

kroese commented May 24, 2024

Okay, fair points! But since all image modes (like None and Proxied) cannot use the thumbnail_url field now anymore, wouldn't it be logical to add an extra field now, let's say image_url, that contains a link to the external (full-size) image that was used to generate the thumbnail in thumbnail_url during the posts creation?

That way custom thumbnails will federate correctly, and it becomes garantueed that all instances will use the same image, as there is no need for metadata fetching any longer by the receiving instances. Except for the text-summary, which will be solved when the metadata becomes part of federation in the future.

Also Im running into the problem that after enabling proxy-mode, doesnt have any effect on all my existing posts in the database? They still have the un-proxied thumbnails. Adding this image_url field would solve that problem too, because then the proxied url can be generated (or not) when the post is displayed, instead of a one-time decision during the posts creation.

@dessalines
Copy link
Member Author

That way custom thumbnails will federate correctly, and it becomes garantueed that all instances will use the same image, as there is no need for metadata fetching any longer by the receiving instances.

That's possible, but it adds a 2nd level of complication that's unecessary IMO, especially now since the metadata fetching is backrounded, and is only ever run once anyway. Instead of 1 source data (the post.url), you now have 2, and would complicate the post logic yet again.

Also Im running into the problem that after enabling proxy-mode, doesnt have any effect on all my existing posts in the database? They still have the un-proxied thumbnails.

That's true, any change in image mode won't work for past images, for obvious reasons. It'd have to historically fetch metadata and potentially create proxy links for years of historical posts.

Adding this image_url field would solve that problem too, because then the proxied url can be generated (or not) when the post is displayed, instead of a one-time decision during the posts creation.

It wouldn't, because the proxy links (if that setting is turned on) aren't live-generated, they're created and stored in the DB on a post creation / edit. I don't think it'd be a good idea to make those thumbnail urls live generated on either the front or back end either, better to have them stored in the DB.

@kroese
Copy link
Contributor

kroese commented May 24, 2024

Instead of 1 source data (the post.url), you now have 2, and would complicate the post logic yet again.

No, there are 2 already. From the moment "custom thumbnails" was merged, the post.url became coupled loose from the post.image, and now they are independant from eachother.

So either this 2nd field needs to be included for federation, or otherwise the whole "custom thumbnails" feature needs to be rolled back, because its incompatible with federation. (It was compatible before the recent decision to completely ignore the thumbnail_url, so there is no blame on the person who created that pull).

I don't think it'd be a good idea to make those thumbnail urls live generated on either the front or back end either, better to have them stored in the DB.

I disagree, but its your decision/project ofcourse!

My main problem is that you are throwing away crucial information: what was the URL used to generate the thumbnail. This information is handy for other instances, so they can rely on it to generate their thumbnail (instead of having to guess that the remote instance generated it from the metadata). And also handy for the local instance, so they can change between image modes in the future, or re-generate thumbnails at a later time in a higher quality, etc.

I am even bold enough to say that the absence of this field is the cause of a lot of other issues, for example, you said it was dangerous now that proxy links get chained. But if there was a field for the source image, then proxy links could never be chained in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants