Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rktio: support ICU as iconv alternative, e.g. for Windows #4844

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

LiberalArtist
Copy link
Contributor

This PR aims to support ICU as an alternative to iconv in rktio to implement byte converters.

Windows does not have iconv, so Racket currently distributes libiconv-2.dll in platform-specific packages like racket-win32-x86_64-3 that are part of even the minimal Racket distribution. Building and distributing third-party C libraries is no fun. However, while looking into #4277, I discovered that Windows has provided ICU as a supported system DLL since the Windows 10 Creators Update (1703), released in 2017. This appears to mean that all Windows versions in "mainstream support" from Microsoft include ICU.1

My hope is that by using the OS-provided ICU, we can stop distributing libiconv-2.dll for minimal Racket on Windows.

The functionality may be useful in other contexts, too: for example, it seems Android has either no iconv or a very rudimentary iconv (vid. e.g. 8a08704), but Android provides ICU as a system library.

I intend to contribute a similar patch to Chez Scheme, but I've found it easier to start with rktio.

Open Design Questions

  1. I have not added any support to configure yet. I think the behavior should be:

    • On Windows, automatically enable ICU if AC_CHECK_HEADER([icu.h]) finds it. To support building in older environments, don't build ICU support if the header is not found. To support running on older Windows versions, dynamically load icu.dll with LoadLibraryW/GetProcAddress. This means no linking flags are needed.
    • On other platforms, enable ICU only if specifically requested. If it is requested, find the appropriate flags either with AX_CHECK_ICU or by using pkg-config directly.

    But native Windows builds do not use Autoconf, and I'm not sure how to do the equivalent of AC_CHECK_HEADER([icu.h]). (I cannot overstate the extent to which I am not a Windows person—perhaps ironically, I have only tested this so far on GNU/Linux.) The simplest thing to do would probably be to omit the check and require an explicit --disable-icu (/disableICU?) to build in old Windows environments.

  2. If both iconv and ICU are enabled and found, currently ICU is used. That's certainly debatable. On non-Windows platforms, it seems to make sense because enabling ICU would be an explicit choice. On Windows, one consideration is that an iconv DLL might (sometimes?) end up being installed as a dependency of racket/draw (see racket/libs@8ba7a87) or for some other reason, and having that change the behavior of bytes-open-converter seems worth avoiding.

  3. Currently, this change is entirely contained within the C code of rktio, with no change to the interface exposed to Racket. I could imagine ways of exposing more. The possibility I'd most strongly consider is adding a function to report whether iconv or ICU is being used. If there were a reason to want such a thing, it would be possible to expose the new rktio_iconv_converter_open and rktio_icu_converter_open so they could both be used if both are available. If there were a desire to support dlopening ICU on non-Windows platforms, that might be a reason to drive more from the Racket side.

  4. There might be a better ICU_BUF_SIZE.

  5. If we want to stop distributing libiconv-2.dll for minimal Racket, there would need to be changes to the native library packages, which I have not attempted. That can (and probably should!) be done as a second step, though.

    As a compatibility consideration, I looked into the encoding names recognized by ICU compared to GNU libiconv. (The documentation is already clear that the supported encodings are system-dependent, so this is an extra level of scrutiny.) Both ICU and GNU libiconv recognize multiple aliases for some encodings. For most of the encodings supported by GNU libiconv (with --enable-extra-encodings), at least one alias is also listed by ICU. There are some encodings for which that is not the case, though. I suspect these encodings are also supported by ICU under different names, and a sufficiently motivated person could fill in a translation table. Some of them might even be recognized by the matching functions in the ICU API: I only checked for exact matches.

Footnotes

  1. As explained in a comment in the code, in this PR I'm actually targeting Windows 10 version 1903, which avoids some per-thread initialization needed in earlier versions. The only earlier version of Windows still in mainstream support appears to be version 1809 (Enterprise LTSC 2019), for which mainstream support ends in January.

@otherjoel
Copy link
Sponsor Contributor

  • On Windows, automatically enable ICU if AC_CHECK_HEADER([icu.h]) finds it. To support building in older environments, don't build ICU support if the header is not found. To support running on older Windows versions, dynamically load icu.dll with LoadLibraryW/GetProcAddress. This means no linking flags are needed.

Just to be clear, it sounds from this that the minimum version of Windows required to run the official distributions of Racket available from download.racket-lang.org would not increase? (Currently that minimum version is given as Windows 7, although for 8.8 and 8.9 it was given as Windows 10. There was some discussion in #4542 about what versions of Windows are actually supported.)

@LiberalArtist
Copy link
Contributor Author

  • On Windows, automatically enable ICU if AC_CHECK_HEADER([icu.h]) finds it. To support building in older environments, don't build ICU support if the header is not found. To support running on older Windows versions, dynamically load icu.dll with LoadLibraryW/GetProcAddress. This means no linking flags are needed.

Just to be clear, it sounds from this that the minimum version of Windows required to run the official distributions of Racket available from download.racket-lang.org would not increase? (Currently that minimum version is given as Windows 7, although for 8.8 and 8.9 it was given as Windows 10. There was some discussion in #4542 about what versions of Windows are actually supported.)

That's my intention. We should definitely check that it works in practice, though!

If we went this route with configure:

But native Windows builds do not use Autoconf, and I'm not sure how to do the equivalent of AC_CHECK_HEADER([icu.h]). (I cannot overstate the extent to which I am not a Windows person—perhaps ironically, I have only tested this so far on GNU/Linux.) The simplest thing to do would probably be to omit the check and require an explicit --disable-icu (/disableICU?) to build in old Windows environments.

Then building Racket in old environments would require a configuration change, but running it should not.

@mflatt
Copy link
Member

mflatt commented Dec 7, 2023

I don't think we need to support compile-time configuration of ICU for Windows. It should just be always on in DLL mode and used as a fallback when iconv isn't available. (It looks like things are in place where we could avoid #includeing icu.h on Windows even if it's available, but I'm not sure I'm reading things right.) So, --enable-icu-dll could go away and winfig.bat wouldn't need to change. Maybe one day we'll switch the default from iconv to ICU.

An --enable-icu for other platforms makes sense, and the various configure improvements there look good.

@LiberalArtist
Copy link
Contributor Author

I don't think we need to support compile-time configuration of ICU for Windows. … So, --enable-icu-dll could go away and winfig.bat wouldn't need to change. Maybe one day we'll switch the default from iconv to ICU.

That would certainly make things easier.

(It looks like things are in place where we could avoid #includeing icu.h on Windows even if it's available, but I'm not sure I'm reading things right.)

I initially thought this wouldn't work, but then I ended up getting halfway there anyway, and now I think it probably would work, so I will try filling in the rest and see how it goes.

I think the biggest thing that had seemed problematic to me originally were comments pointing to https://unicode-org.atlassian.net/browse/ICU-12420 like the one deprecating U_STANDARD_ERROR_LIMIT because The numeric value may change over time. Now, though, I think that issue is actually specific to constants that were meant to mark the end of open-ended lists. The numeric values of the actual error codes we need to test, including for U_SUCCESS and U_FAILURE, do seem to be covered by ICU's ABI stability.

I had also seen that there is some complicated logic allowing for alternate definitions of UChar. I haven't fully understood it, but it seems like it is for niche situations that wouldn't apply here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants