Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged PR(s) for tests Registry files should be always UTF-16LE and newline compatible with UTF-16 #4654

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

tukanos
Copy link
Contributor

@tukanos tukanos commented Jan 13, 2022

This PR merges:

  1. [tests: Registry files should be always UTF-16LE (UCS-2 LE BOM) encoded.] (tests: Registry files should be always UTF-16LE (UCS-2 LE BOM) encoded. #4586)
  2. and [tests: Make test for files to end with a newline compatible with UTF-16] (tests: Make test for files to end with a newline compatible with UTF-16 #4585)

@niheaven
Copy link
Member

It should be in Pester's Should form, and I'll review or refactor it in a few days.

@rasa
Copy link
Member

rasa commented Jan 28, 2022

If you try to edit a UTF16-LE file on GitHub, you will see the message

We’ve detected the file encoding as UTF-16LE. When you commit changes we will transcode it to UTF-8.

I would prefer we stick with formats that the tools we use support.

@rasa
Copy link
Member

rasa commented Jan 28, 2022

Another issue is GitHub's commit change viewer doesn't show changes to UTF16-LE files, even if the file does't have non-ASCII characters in it. Instead it shows

Binary file not shown.

If the file is converted to UTF-8, changes display fine, even if the file has non-ASCII characters.

@niheaven niheaven added the waiting feedback Waiting for user feedback label Feb 16, 2022
Copy link
Contributor Author

@tukanos tukanos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing the cases.

@tukanos
Copy link
Contributor Author

tukanos commented Feb 16, 2022

@niheaven what kind of feedback are you expecting?

@niheaven
Copy link
Member

See @rasa's comments, and is there some solutions?

And, could you please refactor the test using Pester's Should syntax?

@tukanos
Copy link
Contributor Author

tukanos commented Feb 16, 2022

@niheaven

See @rasa's comments, and is there some solutions?
I will address them. Most of the issues are actually not issue of github, but with git itself.

And, could you please refactor the test using Pester's Should syntax?
I will give it a try. I'm not used to PowerShell testing (use it mostly one time scripting), but since I use testing in different languages I will give it a shot. Also I'm in middle of migrating my workstation so it may take some time.

@rasa

If you try to edit a UTF16-LE file on GitHub, you will see the message

We’ve detected the file encoding as UTF-16LE. When you commit changes we will transcode it to UTF-8.

I would prefer we stick with formats that the tools we use support.

The is inherent git issue. Git does not support UTF-16XX encoding -natively - you should read the excellent reply at SO Why does not git support UTF-16 natively.

Every project on GitHub that needs to use UTF16XX is not using the GitHub build-in tools. So you have to pull it, edited and push it. Nuisance I agree, but still better than having scoop throwing errors due to incorrect encoding.

Since GitHub is now owned by Microsoft it should (don't know if they ever will) add the support to edit UTF16XX files in which their own registry is written.

Another issue is GitHub's commit change viewer doesn't show changes to UTF16-LE files, even if the file doesn't have non-ASCII characters in it. Instead it shows

Binary file not shown.

If the file is converted to UTF-8, changes display fine, even if the file has non-ASCII characters.

That again issue of GitHub tools, which should be updated to support UTF16 encoding. There is not much I can do here. While I understand it is comfortable to view it directly here, I don't see registry files as huge problem. In my eyes, they don't change that often.

@rashil2000
Copy link
Member

I will just add something here as a maintainer.

So you have to pull it, edited and push it. Nuisance I agree, but still better than having scoop throwing errors due to incorrect encoding.

Viewing files on GitHub is an extremely important part of the review process in buckets. There's so much activity here that it is impractical to open up the terminal, pull the commit and check the files for review in a PR every time. I agree not every PR has registry files, but not being able to view files occasionally on the web would be a blocker for me, personally.

@rasa
Copy link
Member

rasa commented Feb 16, 2022

To summarize:

  1. There is nothing in UTF-16, that UTF-8 doesn't supply, in terms of the provided characters. It's simply a different encoding.
  2. regedit.exe exports UTF-8, if asked (export to REGEDIT4)
  3. reg.exe imports UTF-8
  4. I have never had to use UTF-16 when working with .reg files (can someone provide an example where UTF16 is required?)
  5. Github doesn't support UTF-16 per here and here
  6. Git doesn't support UTF-16 per SO.
  7. @SalviaSage says "UTF-8 = works."
  8. @tukanos says "UTF-8 and UTF-8 BOM works for now"

If it stops working in the future, let's fix the issue then.

@rasa rasa closed this Feb 16, 2022
@rasa
Copy link
Member

rasa commented Feb 16, 2022

I closed this by accident, sorry.

@rasa rasa reopened this Feb 16, 2022
@tukanos
Copy link
Contributor Author

tukanos commented Feb 28, 2022

@rashil2000

I will just add something here as a maintainer.

So you have to pull it, edited and push it. Nuisance I agree, but still better than having scoop throwing errors due to incorrect encoding.

Viewing files on GitHub is an extremely important part of the review process in buckets. There's so much activity here that it is impractical to open up the terminal, pull the commit and check the files for review in a PR every time. I agree not every PR has registry files, but not being able to view files occasionally on the web would be a blocker for me, personally.

That is understandable. Maybe good approach is to push GitHub (e.g. dev polls) to improve the tooling now that it is owned by Microsoft. I find it funny because PowerShell uses Unicode which is UTF16 and is hosted here on GitHub.

@rasa

If it stops working in the future, let's fix the issue then.

I don't have an issue with that. The only problem I see is to identify the issue as connected to only UTF16. How do you say, without having the source, that the issue is only UTF16?

To summarize:

1. There is nothing in UTF-16, that UTF-8 doesn't supply, in terms of the provided characters. It's simply a different encoding.

2. regedit.exe exports UTF-8, if asked (export to REGEDIT4)

3. reg.exe imports UTF-8

4. I have never had to use UTF-16 when working with .reg files (can someone provide an example where UTF16 is **required**?)

5. Github doesn't support UTF-16 per [here](https://github.com/ScoopInstaller/Scoop/pull/4654#issuecomment-1024319867) and [here](https://github.com/ScoopInstaller/Scoop/pull/4654#issuecomment-1024337923)

6. Git doesn't support UTF-16 per [SO](https://stackoverflow.com/questions/52472435/why-doesnt-git-natively-support-utf-16/52474068#52474068).

7. @SalviaSage [says](https://gist.github.com/SalviaSage/8eba542dc27eea3379a1f7dad3f729a0) "UTF-8 = works."

8. @tukanos [says](https://github.com/ScoopInstaller/Scoop/pull/4586#issue-1080877639) "UTF-8 and UTF-8 BOM works for now"

ad 1) True. The issue is that UTF16 is a native encoding, which means that there is a conversion done.
ad 2) True - via the option save as Windows 9x/NT.
ad 3) True. You forgot to mention reg.exe exports only to UTF16.

ad 4) I can provide an example where I wanted to use UTF16 to avoid any conversion issues. I wanted to export programmatically registry into files, make changes and import them. Since reg.exe only exports into UTF16, I did not covert them, as conversion can fail. The whole process was done at UTF16.

ad 5 & 6) Not yet. As I already mentioned above
ad 7 & 8) Yes, UTF8 (No/With BOM) works for now.

I think the major obsticle is github/git lacking UTF16 support. You can close the issue now and reopen it in the future if needed. One last important note for the future is that conversion between any formats can lead to errors/bugs. The native windows registry format is UTF16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting feedback Waiting for user feedback work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants