Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't list files with chinese characters on windows 10 #316

Open
1 of 5 tasks
vasiby opened this issue Oct 1, 2019 · 18 comments · May be fixed by #319
Open
1 of 5 tasks

Can't list files with chinese characters on windows 10 #316

vasiby opened this issue Oct 1, 2019 · 18 comments · May be fixed by #319
Labels

Comments

@vasiby
Copy link

vasiby commented Oct 1, 2019

Description

  • Type of issue :
    • Installation
    • Font-related
    • Feature request
    • Bug in existing feature
    • Developer mode : Code quality / Tests / Documentation

Shows an error and doesn't list files when any file in the directory has chinese characters in it's name, for example
杨帆.pdf

Error
Invalid argument @ rb_file_s_lstat - D:/??.pdf

Windows 10
colorls 1.2.0
ruby 2.5.5p157 (2019-03-15 revision 67260) [x64-mingw32]

@avdv
Copy link
Collaborator

avdv commented Oct 4, 2019

Hi.

What happens when you run RUBYOPT=-Ku colorls d:/ instead?

@vasiby
Copy link
Author

vasiby commented Oct 4, 2019

Invalid argument @ rb_file_s_lstat - D:/??.pdf

Also it if helps, I've made sure to check the UTF-8 checkbox in the Ruby windows installer.

@avdv
Copy link
Collaborator

avdv commented Oct 4, 2019

I found this bug report which might be related: https://bugs.ruby-lang.org/issues/14591

What is the output of ruby -e 'puts Encoding.find("filesystem")' on your machine? And what filesystem is on D:? NTFS?

@vasiby
Copy link
Author

vasiby commented Oct 4, 2019

ruby -e 'puts Encoding.find("filesystem")'
Windows-1251

Filesystem is NTFS.

For example colorls in WSL (Ubuntu on windows) is able to correctly display the same file

@avdv
Copy link
Collaborator

avdv commented Oct 4, 2019

ruby -e 'puts Encoding.find("filesystem")'
Windows-1251

I guess that's bad. As you cannot properly encode chinese characters with CP 1251. That's probably why you only see those question marks in the output.

What happens if you change the codepage of the console to unicode: chcp 65001, and then run colorls again?

@vasiby
Copy link
Author

vasiby commented Oct 4, 2019

chcp 65001 doesn't help

@avdv
Copy link
Collaborator

avdv commented Oct 4, 2019

Did you try with -Ku as before?

@avdv
Copy link
Collaborator

avdv commented Oct 4, 2019

You could also try set RUBY_DEBUG=codepage=65001.

@vasiby
Copy link
Author

vasiby commented Oct 4, 2019

RUBY_DEBUG doesn't work
ruby -Ku colorls doesn't work

The same ??.pdf error

@avdv
Copy link
Collaborator

avdv commented Oct 4, 2019

FTR, here is how Ruby initializes the encoding it uses for filesystem operations on Windows: https://github.com/ruby/ruby/blob/c5eb24349a4535948514fe765c3ddb0628d81004/localeinit.c#L124-L130

To make it work, you need to use a codepage which properly encodes all of the characters you want to use with it.

Ie. you would want that Encoding.find('filesystem') returns UTF-8.

@avdv
Copy link
Collaborator

avdv commented Oct 7, 2019

After lots of back-and-forth (https://github.com/avdv/clocale/pull/45/commits), I discovered that Dir.entries accepts an encoding parameter since Ruby 2.1.

That way, I can force the Encoding to be UTF-8, which seems to work: https://ci.appveyor.com/project/avdv/clocale/build/job/48acvkeq5madf01d#L74

When displaying the strings on the console I probably have to encode them to the default_external encoding to make it work -- I am not sure; and I don't have a Windows box to test it.

Could you save the following code to a file and run ruby THE_FILE.rb d:/ and see whether it works for you or paste the output, please?

puts Encoding.find('filesystem')
puts Encoding.default_external
puts Encoding.default_internal

arg = ARGV[0]
puts arg
@name = File.basename(arg)
@stats = File.lstat(arg)

if @stats.directory?
  begin
    @contents = Dir.entries(arg, encoding: Encoding::UTF_8)

    @contents.each do |item|
      puts "#{item.encode(Encoding.default_external, Encoding::UTF_8)}: #{File.lstat(File.join(arg, item)).size}"
    rescue => e
      warn "#{item}: #{e}"
    end
  rescue StandardError => e
    puts "#{e}: #{e.backtrace}"
  end
end

@vasiby
Copy link
Author

vasiby commented Oct 8, 2019

Here's the output

Windows-1251
IBM866

.
.: 4096
..: 4096
asciidoctor: 565
asciidoctor.bat: 41
bundle.cmd: 672
bundler.cmd: 674
colorls: 541
colorls.bat: 41
erb: 5086
erb.cmd: 5228
gem: 546
gem.cmd: 688
irb: 508
irb.cmd: 650
lsc.bat: 36
rake: 656
rake.cmd: 41
rdoc: 514
rdoc.cmd: 656
ri: 510
ri.cmd: 652
ridk.cmd: 694
ridk.ps1: 870
ruby.exe: 34816
rubyw.exe: 34816
ruby_builtin_dlls: 4096
setrbvars.cmd: 312
tst.rb: 525
x64-msvcrt-ruby260.dll: 3527680
杨帆.pdf: U+6768 from UTF-8 to IBM866

@avdv
Copy link
Collaborator

avdv commented Oct 9, 2019

杨帆.pdf: U+6768 from UTF-8 to IBM866

OK, there was an encoding problem. But the output here looks OK, without the need to encode it explicitly. Did it also show up this way on your console? Or was it somehow mangled?

@vasiby
Copy link
Author

vasiby commented Oct 9, 2019

Did it also show up this way on your console?

Yes, the console output was ok

@avdv
Copy link
Collaborator

avdv commented Oct 9, 2019

Yes, the console output was ok

OK, then it is probably better to simply print it as is, without re-encoding and substituting the undefined characters (in the target encoding) with the replacement character, ie. ?.

I'll see that I can come up with a PR this evening... if nobody beats me to it.

avdv added a commit to avdv/colorls that referenced this issue Oct 10, 2019
@avdv avdv linked a pull request Oct 10, 2019 that will close this issue
5 tasks
@avdv
Copy link
Collaborator

avdv commented Oct 10, 2019

I have created a PR, but I am still missing test coverage. If you want, check out the branch and see if it works for you.

@vasiby
Copy link
Author

vasiby commented Oct 10, 2019

The fix works, thank you

image

@avdv avdv added the bug label Jan 8, 2020
@avdv
Copy link
Collaborator

avdv commented Apr 20, 2020

Hi @vasiby, long time no progress here... Sorry!

But after fixing #352 I am revisiting your issue. Are you still using colorls on Windows? Could you perhaps check wether the latest changes done in #353 work for your use case too?

  1. does it still produce an error?
  2. does it show the file name correctly?

You can install a pre release including the latest fixes via gem install --prerelease colorls. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants