Skip to content

Spec for env path rewriting for setenv buildenv

Raja Boujbel edited this page Oct 23, 2023 · 2 revisions

setenv / build-env updates on Windows

x-env-path-rewrite` proposal

For each variable in build-env and setenv, the assignment is trivial if it contains neither interpolated variables nor / nor \.

x-env-path-rewrite is a list contain one entry for each environment variable which includes a non-trivial assignment in setenv/build-env. Each entry is either a 2-element list [NAME false] or a 3-element list [NAME SEPARATOR PATH-FORMAT] where NAME is the name of the environment variable. SEPARATOR is a filtered formula which evaluates to exactly one 1-character string to treat as the separator. PATH-FORMAT is a filtered formula which evaluates to exactly one string from:

  • "host" - use the “host” interpretation of PATHs (= convert via cygpath on Windows). It’s an error if the resulting update includes the separator character.
  • "host-quoted" - use the “host” interpretation of entries and double-quote any entries which include the separator character.
  • "target" - use the “target” interpretation of entries (i.e. no change)
  • "target-quoted" - use the “target” interpretation of entries and double-quote any entries which include the separator character.
# Some (semi)-hypothetical updates
setenv: [
  # Splits msvs-detect:tools-bin
  [ PATH += msvs-detect:tools-bin ]
  # Doesn't split msvs-detect:tools-bin (because it's in a string constant)
 [ PATH += "%{msvs-detect:tools-bin}%" ]
  [ MANPATH += "/home/dra/manual/man" ]
  [ PKG_CONFIG_PATH += "%{lib}%/pkgconfig" ]
  [ OCAMLPATH += "%{lib}%" ]
  [ CI = "/home/ci/config" ]
  [ FOO = "this does not require/allow a rewrite entry" ]
]
# All requiring x-env-path-rewrite
x-env-path-rewrite: [
  [ PATH (":" | ";" {os = "win32"}) ("target" | "target-quoted" {os = "win32"})]
  [ MANPATH ":" "host"]
  [ PKG_CONFIG_PATH ":" "host"]
  [ CAML_LD_LIBRARY_PATH (":" | ";" {os = "win32"}) "target"]
  [ OCAMLPATH (":" | ";" {os = "win32" | os = "cygwin"}) "target"]
  [ CI" false]
]

opam would need default rewrite rules for PATH and MANPATH (because it actually sets them), but it should then be possible to leave all the others using the default rule of FOO (":" {os != "win32"} | ";") "target".

Current uses

ocaml/opam-repository appears to be the only public user of setenv / build-env (also checked coq/opam, coq/platform, xapi-project/xs-opam, gfngfn/satysfi-external-repo).

opam-repository at 8c698f3 contains the following updates (note only = and += used):

ASAN_OPTIONS = "detect_leaks=0,exitcode=0"
BUILD_DIR = "%{lib}%/sibylfs-lem"
CAML_LD_LIBRARY_PATH = ""
CAML_LD_LIBRARY_PATH += "%{lib}%/stublibs"
CAML_LD_LIBRARY_PATH = "%{lib}%/stublibs"
CAML_LD_LIBRARY_PATH = "%{_:stubsdir}%"
CI = ""
EXTISM_TEST_NO_LIB = ""
FCC_TEST = "true"
HOMEBREW_NO_AUTO_UPDATE = "1"
LSAN_OPTIONS = "detect_leaks=0,exitcode=0"
LSBCC_BESTEFFORT = "1"
LSBCC_LSBVERSION = "4.0"
MIRAGE_OS = "unix"
MIRAGE_OS = "xen"
OCAMLPARAM = "p=1,g=1,_"
OCAML_TOPLEVEL_PATH = "%{toplevel}%"
PKG_CONFIG_PATH += "%{lib}%/pkgconfig"
TZ = ""
XDG_CACHE_HOME = "../cache"

opam itself effectively performs (opam uses the more exotic operators =: and =+=):

CDPATH = ""
MAKEFLAGS = ""
MANPATH =: switch-man-dir
OPAMCLI = "2.0"
OPAM_LAST_ENV = last-env-file
OPAM_PACKAGE_NAME = package-name
OPAM_PACKAGE_VERSION = package-version
OPAMROOT = opam-root
OPAM_SWITCH_PREFIX = switch-dir
OPAMSWITCH = switch-name
PATH =+ cygwin-bin-path
PATH += plugin-bin-dir
PATH =+= switch-bin-dir

Interpretation of environment changes

Currently we appear to have the very simple STRING op STRING. However, we already have a lot of special treatment for versions of NAME even before considering Windows (MANPATH, etc.).

The intent of all the operators apart from = is that STRING is really a path (e.g. %{lib}%/stublibs) or even a path list (e.g. %{lib}%:%{lib}%/stublibs).

Windows adds two complexities:

  • Different PATH-separators (; for PATH, etc.). Not universally ; - e.g. OCAMLPATH on Cygwin is ; (ocamlfind legacy error) and PKG_CONFIG_PATH comes from Cygwin so is always :, etc.
  • Different directory formats (not handled yet; underlying problem in #4690). So some updates should interpret directories as Windows paths, others need to convert them back to Cygwin, etc.

At the moment, everything is done by either meta-data in opam (the information about known-environment variables like MANPATH, PKG_CONFIG_PATH, PATH, etc.) or heuristics.

So for NAME op VALUE we need to know:

  • The type of NAME (either string or path)
  • If NAME has type path, the separator of NAME (which is a function based on the platform)
  • The format required for the paths (should they be in “Cygwin” / “MSYS2” format or native)

We also want to have opam files use one notation, for portability.

#2927 proposed instead of %{lib}%/stublibs having %<%{lib}%/stublibs>% to solve the first two of these issues. The idea was that %<..>% explictly denotes type path-list and scopes the conversion of \.

However, it introduces a fundamental problem, which Daniel identified in #2927 (comment) that you have to remember to do that - i.e. opam quietly does the right thing on Unix, but the wrong thing on Windows.

This issue persists with the current proposal in #5636. Where in #2927 (comment) we must remember to put %<...>% everywhere, in #5636 we presently have to remember to put x-allow-env-path-rewrite-on-windows: true everywhere.

Alternative proposal (with “opam 3.0” future)

One simple, statically visible assumption: if an assignment includes an opam “path-like” variable, then the assignment is a path. Thus:

CAML_LD_LIBRARY_PATH = "%{lib}%/stublibs"

expands to C:\Users\DRA\AppData\Local\opam\default\lib\stublibs on Windows (i.e. with \stublibs rather than /stublibs) precisely because %{lib}% was mentioned. Conversely:

XDG_CACHE_HOME = "../cache"

which is probably an error on Windows anyway, is left alone because it has no mention of variables.

There are no Windows users of opam (because it hasn’t existed!), so the fact this conversion only happens on Windows means we should only be worrying about principle, not existing repositories. On Unix, this is never going to trigger.

A future proposal would be to make explicit the way the updates are interpreted, for example:

CAML_LD_LIBRARY_PATH : [":" {os != "win32"} ";" {os = "win32"}] path = "%{stublibs}%/stublibs"

Here, the “type” expression contains a filter which selects the separator and indicates it’s a path. We could document known defaults - e.g. for PATH, MANPATH, PKG_CONFIG_PATH while keeping opam OCaml-agnostic. We could add built-in aliases so you don’t always have to write [":" {os != "win32"} ";" {os = "win32"}]. The conversion from opam-version: "2.0" applies the heuristic, so seeing:

opam-version: "2.0"
setenv: CAML_LD_LIBRARY_PATH = "%{lib}%/stublibs"

it can, in a principled manner, convert to:

opam-version: "3.0"
setenv: CAML_LD_LIBRARY_PATH : colon-semi-separator path = "%{lib}%/stublibs"

Rather than globally disabling conversion, the 2.0 x-field could rather be:

opam-version: "2.0"
setenv: CAML_LD_LIBRARY_PATH = "%{lib}%/no/really/give/me/forward/slashes"
x-disable-env-path-rewrite: ["CAML_LD_LIBRARY_PATH" "NOT_INCLUDED_IN_FIELDS"]

which could give a lint warning because NOT_INCLUDED_IN_FIELDS is not specified in either a setenv or build-env field.

What about PKG_CONFIG_PATH

The type parameters actually want to know the path format as well. This is going to be more like:

opam-version: "3.0"
setenv: [
  PKG_CONFIG_PATH : ":" host-format path += "%{lib}%/something"
  CAML_LD_LIBRARY_PATH : colon-semi-separator target-format path = "%{lib}%/stublibs"
]

Resulting in these two updates:

PKG_CONFIG_PATH = "$PKG_CONFIG_PATH:/cygdrive/c/Users/DRA/AppData/Local/opam/default/lib/something"
CAML_LD_LIBRARY_PATH = "C:\Users\DRA\AppData\Local\opam\default\lib\stublibs"

As ever, naming tbc. host/target or build/native or something… What about portable updates with multiple paths

At the moment, one can write setenv: PATH += "/usr/bin:/bin". That’s clearly not portable, so trying to mess around with converting the : is daft and clearly never going to work. So in an opam file, if you have two paths, this should clearly be written:

setenv: [
  PATH += "/usr/bin"
  PATH += "/bin"
]

and opam will now do something approximately correct (or at least it will use the correct separator!). Where this does relevantly come in is expanding existing variables. For example, the msvs-detect package correctly writes a whole series of updates in package variables such that %{msvs-detect:include}% = "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include;C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include;C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um;C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared;C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um".

When applying:

opam-version: "2.0"
setenv: [
  INCLUDE += "%{msvs-detect:include}%"
]

opam at this point interprets the separator correctly in the variable it’s expanding, or we’ll regress #4861.