Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters with accents in profile path break Lua's require on Windows #5425

Open
ktunkiewicz opened this issue Sep 10, 2021 · 24 comments
Open

Comments

@ktunkiewicz
Copy link
Contributor

image

This happens when require() is used in Lua, like this for example:

require('kinstall/gui')

In the screenshot you see Lua throwing that error multiple times, failing in multiple require commands while trying to load other parts of the script.

Got that issue reported from two people, one named Sławek and other named Łukasz (obviously, both naming their Windows uer like this). Creating a new Windows user without the accent character fixed the issue.
This happens with latest version, but I got it first reported in the beginning of this year, so the problem is there for longer.

I searched the issues tab and the only similar thing I found was this:
#1616

@vadi2
Copy link
Member

vadi2 commented Sep 11, 2021

To confirm, you're making use of this feature to load your scripts?

@ktunkiewicz
Copy link
Contributor Author

@ktunkiewicz
Copy link
Contributor Author

ktunkiewicz commented Sep 11, 2021

I got two reports about this, first one was early this year, so this applies to earlier versions of Mudlet as well.
Both are using Windows 10, for one of them it was character ł, for other one it was character Ł that caused this, but it could be a coincidence that it's the same character as it is quite common in Polish names.

@vadi2
Copy link
Member

vadi2 commented Sep 11, 2021

The package installed then is called kmap?

@vadi2
Copy link
Member

vadi2 commented Sep 11, 2021

The code that enables this feature is here, and we have a lot of functions already for working around Windows & i18n issues. Since we're passing this to the Lua VM - try using the same solution I developed already for hunspell which is also a C library: https://github.com/Mudlet/Mudlet/pull/2758/files#diff-a4fc43e16d0ada6ab826b586dbc8c8600ac838f7558a4fd7c7434db135f384e1R5213

@zeddicus-pl
Copy link

Ok, I see how this works now. Mudlet populates additional paths to Lua (in utf8 already). Then once I call the required(module_name) function, the module_name gets converted to utf8 by the utf8 compatibility layer that shadows lua's original require function.
So both the search path and the module name are in utf8. Therefore the problem must be somewhere deeper in the Lua's require implementation.

I will try out what happens if I provide an absolute path to lua file instead.

The other thing you provided doesn't look like something I could use, or I'm not sure how I could use that. I don't control where the files are located, the non-ascii character in path comes from windows user folder name.

@vadi2
Copy link
Member

vadi2 commented Sep 11, 2021

Try 'manging' the path using that function - it could be that Lua passes the utf8 path to Windows C API which then does not understand it.

@zeddicus-pl
Copy link

Ah, I get the point now, you say to copy the files somewhere with no non-ascii characters. But that's weird, I'd have to copy them outside even the windows user folder, don't think that's OK, firstly that may trigger some permissions problem, secondly I don't want to modify content of someone's hard drive. And copying to windows temp folder may not be wise too as it may be periodically cleared by some software

@zeddicus-pl
Copy link

zeddicus-pl commented Sep 11, 2021

Ok, didn't notice your previous comment, I'm writing from my mobile. That function you made creates a windiws "short path" right? But I have to do this from a lua script, can I do all that from there? Or i may prepare a version of mudlet that mangles these path and give it to that useless that has that issue

@vadi2
Copy link
Member

vadi2 commented Sep 11, 2021

Yeah the latter, try making a PR with that change - and the bots will generate links which you can test with the the users in question :)

@zeddicus-pl
Copy link

zeddicus-pl commented Sep 11, 2021

I just found out that you can add custom module loaders to Lua, from a Lua script. I made one that simply loads a file using io.open, using absolute path, and loading this as module. Basically bypassing any built in path search. Couldn't test it yet, but I should have access to Windows 10 machine later today, so I'll try to replicate and see if that custom loader works. If it does then that is an issue of Lua C code. The default Lua loader is a small C function that uses *char and there's some magic going on in there on these strings, maybe that's where it breaks

@ktunkiewicz
Copy link
Contributor Author

The custom loader didn't work, it looks like this:

-- adding a simplistic Lua package.loader to get around UTF-8 paths issue
local function loadLuaFile(filename)
  local file = io.open(filename, "rb")
  if file then
    return assert(loadstring(assert(file:read("*a")), filename))
  end
  return "\n\tno file '".. filename
end
table.insert(package.loaders, loadLuaFile)

So basically what it does it uses the io.open to open the file, and it results in this error:

'C:/Users/Rafa�/.config/mudlet/profiles/Killer/kgroup/main.lua' not found

I asked that guy to also try and see what the getMudletHomeDir returns:
Screenshot 2021-09-17 at 09 30 39

And then to use io.open and give the path manually:
Screenshot 2021-09-17 at 09 31 23

Internally Lua uses fopen to open the files, the path is just char[] so it will take whatever Mudlet gives it.

Basically, that guy's codepage is cp1250 (i asked him to run getWindowsCodepage to get that).

So if we are converting all paths to UTF-8 - no wonder it doesn't work. What was the rationale for this in the first place? I always thought Windows never uses UTF-8 encoding by default, but maybe there's some other things I don't know. Maybe the lua that we use was compiled with some flags that enable utf8 on fopen or something.

@ktunkiewicz
Copy link
Contributor Author

Aaah, sorry, I misunderstood what utf8_filenames.lua does. It proxied Lua function and converts from utf8 to windows codepage. So, it should work, but it doesn't. It's like Windows now expects an utf8 path and not its own codepage? I'm confused.

@Delwing
Copy link
Contributor

Delwing commented Sep 17, 2021

Just tested on Windows 10 codepage 1250, Mudlet 4.12.0

I am able to require and io.open files with polish accents.

@ktunkiewicz
Copy link
Contributor Author

@Delwing did you test with production or development version of mudlet?

@Delwing
Copy link
Contributor

Delwing commented Sep 17, 2021

Just as stated - 4.12.0. So latest release.

@ktunkiewicz
Copy link
Contributor Author

That's exactly the version Sławek tested it with as well... damn, so that issue probably depends on not only the system version but something else as well... may be hard to fix :-/
I guess just using short paths may resolve the issue.

@Delwing
Copy link
Contributor

Delwing commented Sep 17, 2021

My user directory is without accents, but again... tried to put accents in directory name as well, still working.

@SlySven
Copy link
Member

SlySven commented Sep 18, 2021

IIRC The problem boils down to the fact that I think(!) the Windows API has two methods to works with filenames and paths, a (narrow) 8-bit encoding one - which varies depending on where in the world the PC is set up for and a (wide) 16-bit (UTF-16) one which Windows calls "Unicode". Unless things are carefully done the default method used is the narrow one and I think that is what the Lua library / external install of Lua will use on Windows - and that does not work well for non-ASCII characters (including those in a user's home directory) on that OS.

I think somewhere in our C++ code we load our own Lua script files using Lua's C (binary library) function loadString(...) rather than loadFile(...)because we can use Qt's own file-handling to read the string from the correctly encoded (UF-16 on WIndows) filename and squirt the string into the first function rather than using whatever it is that the second does to handle a filename...

@SlySven
Copy link
Member

SlySven commented Sep 18, 2021

Ah, this also helps a little to explain it - it is talking about : http://lua-users.org/wiki/LoadLibrary and starts off:

On Windows, binary modules are dynamically loaded into Lua as [DLLs]. Windows almost always loads DLLs via the [LoadLibrary] or [LoadLibraryEx] Win32 API functions. These functions have Unicode (LoadLibraryW/LoadLibraryEx) and ANSI (LoadLibraryA/LoadLibraryExA) variants for the file name parameter. In Lua 5.1, loadlib.c calls LoadLibraryA. In Lua 5.2-beta, loadlib.c calls LoadLibraryExA, and its third parameter is taken from the LUA_LLE_FLAGS define (which defaults to 0).

Of course, we are using Lua 5.1 so we are indeed using the narrow (this calls it "ANSI") Windows API function LoadLibraryA(...)

@zeddicus-pl
Copy link

zeddicus-pl commented Sep 18, 2021

@SlySven Yes, some stuff is loaded different, it uses short paths translated from normal paths, it was a workaround for this exact issue. And I think you are right about 8-bit and Unicode paths in windows api. The problem is that we do have a proxy layer for all file-handling Lua functions (utf8_filenames.lua) that is supposed to translate the internally used utf8 strings into encoding that is currently used in Windows.
So we try to use the narrow 8bit encoding.

Apparently, this is not always working properly. As @Delwing said - it works fine for him on wibdows 10, but I know two guys for whom it doesn't. So there must be some additional unknown variable into this problem.
I think the only solution is to replace utf8_filenames.lua with something that translates long paths into short paths instead.
Not sure of it is good idea. That method has some (minor) drawbacks. Shorts paths can be disabled by system administrator (the function that returns short paths then returns a long one, so that's probably safe), and there's a miniscule chance that some filenames collide.
We do use short path method in some internally loaded Lua files, so maybe that's OK to do.

@vadi2
Copy link
Member

vadi2 commented Sep 24, 2021

Let's try. See solution in #5425 (comment).

@vadi2
Copy link
Member

vadi2 commented Oct 4, 2021

We have https://wiki.mudlet.org/w/Manual:Miscellaneous_Functions#getWindowsCodepage - can you test with people who have the issue vs who don't to see if there's a difference?

@Delwing
Copy link
Contributor

Delwing commented Feb 10, 2023

Kind of workaround popped into my head, while someone asked me question about this of one of Discords.

You can add something like:

package.path = package.path .. ";C:\\scripts\\?.lua"

and then you will be able to require files from that directory C:\scripts\ - of course directory itself might be different.
Either file scripts could be copied there or one can make directory junction pointing there with mklink - /j

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants