Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove use of utf8_encode (deprecated in PHP 8.2) #598

Open
wants to merge 1 commit into
base: MOODLE_310_STABLE
Choose a base branch
from

Conversation

sammarshallou
Copy link
Contributor

To go with my other PHP 8.2 fix PR #586...

There is a use of deprecated function utf8_encode. But, the way it's used appears to be entirely pointless - it is calling utf8_encode to create a header based on a string that already came from a header which should already be in UTF-8 (or possibly ISO-8859-1, but anyway).

@marxjohnson already pointed out this was unnecessary in #455

There do seem to be some problems related to S3 and character encoding re. Content-Disposition but I find it hard to believe that calling utf8_encode would help - in fact it sounds like UTF8 isn't technically allowed there and the right answer seems to be something very complicated with URL-encoding and a filename* parmeter as well a filename, according to this blog https://engineering.resolvergroup.com/2022/02/aws-s3-utf-8-content-disposition/

@sammarshallou
Copy link
Contributor Author

I'm doing some more testing on this locally so it might be best to hold off doing anything with this until I finish that next week (will comment again). Haven't found anything wrong, just realised that I don't think my previous testing actually ran the code in question :)

@scara
Copy link

scara commented Feb 15, 2024

Hi @sammarshallou,
I'm not familiar with the S3 Client, just suggesting to add even filename* which fits with the requirement: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#filename_2.

HTH,
Matteo

@sammarshallou
Copy link
Contributor Author

sammarshallou commented Feb 19, 2024

Wow, this change actually does break it. Ooops.

I made a file called こんにちは.jpg

Before change it sends me the file with this header:

Content-Disposition: inline;filename="�����.jpg"

The URL for this file included this parameter:

?response-content-disposition=inline%3Bfilename%3D%22%C3%A3%C2%81%C2%93%C3%A3%C2%82%C2%93%C3%A3%C2%81%C2%AB%C3%A3%C2%81%C2%A1%C3%A3%C2%81%C2%AF.jpg%22

As you can see this is not really valid but it appears to work (Firefox at least shows the right name in header).

After change it is like this:

?response-content-disposition=inline%3Bfilename%3D%22%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF.jpg%22

And results in error:

<Error>
<Code>InvalidArgument</Code>
<Message>
Header value cannot be represented using ISO-8859-1.
</Message>
<ArgumentName>response-content-disposition</ArgumentName>
<ArgumentValue>inline;filename="こんにちは.jpg"</ArgumentValue>
<RequestId>SV0RGCBCE67CF8QB</RequestId>
<HostId>
jgRxp++ahEqqvFsvjc5DGoBN4rdARRre5deHT7btXbq9Jqw+wMvhz8yb01MxUE3EkOKTDrtQ7Rs=
</HostId>
</Error>

So, it looks like the reason the current code works is that by calling utf8_encode it is treating UTF-8 as if it is a longer ISO-8859-1 string (because multiple bytes per character from original UTF-8, now counted as single ISO-8859-1 characters), and then encoding those single characters as UTF-8. Later on, something else converts those characters into URL-encoded format. Then browsers receive the header with ISO-8859-1 characters that are actually UTF-8 and decides to play nice and turn it into UTF-8, because there isn't a filename*.

The easiest fix is probably to use a non-deprecated function to do the same hideous encoding (UTF-8 broken into single bytes and treated as ISO-8859-1 then turned to UTF-8) then it will not change behaviour. Might be nice to do filename* as well though...

@sammarshallou
Copy link
Contributor Author

Our test server deployment system chose today to throw a massive wobbly, so I haven't actually been able to test my fix yet, but I thought I'd push it anyway. I'll update once properly tested.

Note: Needs testing for both CloudFront (which we actually use) and S3 (which we don't but I guess I can switch the setting) URLs, as the result of this function is used for both.

@sammarshallou
Copy link
Contributor Author

I got roped in to fix deployment, so I've now discovered that the recommended approach with filename* doesn't work because S3/CloudFront still won't let you use non-ISO characters.

<Error>
<Code>InvalidArgument</Code>
<Message>
Header value cannot be represented using ISO-8859-1.
</Message>
<ArgumentName>response-content-disposition</ArgumentName>
<ArgumentValue>
inline;filename="ã��ã��ã�«ã�¡ã�¯.jpg"; filename*=UTF-8''"こんにちは.jpg"
</ArgumentValue>
<RequestId>1W52JXHS873MJ0R0</RequestId>
<HostId>
acuwqnHbjn/eVyGqlUtXF0seomw0MsYsqHV9KXaeTUEWMNjsG0iKkAtLpW8IeuKf9nQsXZHNXn8=
</HostId>
</Error>

So, the best option is to go back to the original hack but with a comment to explain it, and using a non-deprecated function for the charset conversion (moodle core_text API). Testing this now.

@sammarshallou
Copy link
Contributor Author

sammarshallou commented Feb 19, 2024

Tested as follows:

  1. Used it in the scenario that caused a deprecated error to break an image rendering in our system (I don't really know how to reproduce this in general, anyway it's fixed now, as you would expect because I removed the deprecated function call...)
  2. Uploaded a file called こんにちは.jpg into a Moodle file resource, waited for it to be moved to S3, then clicked the link to view the jpg - verified that (a) it was using CloudFront URL, (b) the browser tab showed correct name こんにちは.jpg and not the ISO-8859-1 version.
  3. Changed setting to use S3 URL instead of CloudFront and repeated step 2.

In the case of steps 2 and 3 it worked before, this was just verifying that nothing changed. As I only used a different function to do the same thing, wasn't really expecting it to.

So in summary, I've actually tested this now and will deploy to our live release in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants