Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dune variants - how they are supposed to work and scale? #10460

Closed
hannesm opened this issue Apr 26, 2024 · 13 comments · Fixed by #10581
Closed

dune variants - how they are supposed to work and scale? #10460

hannesm opened this issue Apr 26, 2024 · 13 comments · Fixed by #10581
Labels
Milestone

Comments

@hannesm
Copy link
Member

hannesm commented Apr 26, 2024

First of all, thanks for your work on dune! This is an amazing piece of work.

I recently wanted to remove a lot of functors from MirageOS, since they're not really necessary -- we have at runtime only one thing around anyways (i.e. the network interface for xen will always be one thing -- while the network interface when compiling for unix will be something else). This sounded to me like a perfect use case for your dune variant feature. To a small scale it seemed to work fine, but once I "variantified" the TCP/IP stack, I get not very nice results.

Expected Behavior

I'd expect from a dune file with an executable statement that depends on some variants that those will be used across the compilation.

Actual Behavior

File "duniverse/mirage-net/unix/dune", line 4, characters 52-60:
4 |  (libraries common logs macaddr cstruct cstruct-lwt lwt.unix tuntap)
                                                        ^^^^^^^^
Error: Library "lwt.unix" in _build/solo5/duniverse/lwt/src/unix is hidden
(optional with unavailable dependencies).
-> required by library "mirage-net.unix" in
   _build/solo5/duniverse/mirage-net/unix
-> required by library "tcpip.icmpv4-direct" in
   _build/solo5/duniverse/mirage-tcpip/src/icmp/direct

So, instead of the variant specified in the executable statement, the default variant is used. :(

Reproduction

Unfortunately, this is rather big at the moment, but let me try:

With all of that, I try to compile the "device-usage/network" unikernel from https://github.com/hannesm/mirage-skeleton/tree/variants

So, the step(s) to reproduce is:

  • install a fresh switch (OCaml 4.14.2)
  • opam pin the mirage utility
  • clone the mirage-skeleton repository from my branch above
  • opam pin add -n for the above list
  • cd mirage-skeleton/device-usage/network ; mirage configure -t solo5 ; make lock pull build

Variant specifications

At the moment, I specify the default_implementation to be "mirage-net.unix", but in the generated dune file, there is: (libraries ... mirage-net.solo5 ....

I have no idea, why, as seen above, dune thinks that tcp.icmpv4-direct requires a mirage-net implementation at all, and furthermore how it decides to try the .unix one.

Debug avenues - remove default_implementation

I tried to remove the default_implementation from the above mentioned packages. This results in the following error:

dune build --profile release --root . ./dist
Error: No implementation found for virtual library "mirage-net" in
_build/solo5/duniverse/mirage-net/src.
-> required by library "ethernet" in _build/solo5/duniverse/ethernet/src
-> required by library "tcpip.ipv4" in
   _build/solo5/duniverse/mirage-tcpip/src/ipv4/interface
-> required by library "tcpip.icmpv4-direct" in
   _build/solo5/duniverse/mirage-tcpip/src/icmp/direct
-> required by executable main in dune.build:11
-> required by _build/solo5/main.exe
-> required by _build/solo5/network.hvt
-> required by _build/solo5/dist/network.hvt
-> required by alias dist/all (context solo5)
-> required by alias dist/default (context solo5)

Still, mirage-net.solo5 is a dependency in the dune file.

Move all default_implementations to those I want to use

Yes, this indeed works. But it is pretty unpleasant (would need a shell script for postprocessing), since I had hoped the variant feature would exactly provide me with this:

  • I define virtual libraries and default implementation
  • My library code uses virtual libraries
  • The only thing is at top-level (linking time, where I specify the executable) selecting the concrete implementation.

Quo vadis?

So, any pointers how I can debug this issue further? Or do I hit an intended limit of dune variants that I was not aware of?

Thanks a lot for your time reading this issue. 🐫 🐫

@hannesm

This comment was marked as outdated.

@rgrinberg rgrinberg added the bug label Apr 26, 2024
@rgrinberg
Copy link
Member

I can investigate the issue, but the example has to be simplified further. Try to shave off all the unnecessary dependencies in your repro

@hannesm
Copy link
Member Author

hannesm commented Apr 30, 2024

Thanks Rudi. I've a much better intuition and will shortly follow up with a much smaller case.

@hannesm
Copy link
Member Author

hannesm commented May 5, 2024

So, a smaller case is at: https://github.com/hannesm/dune-variants-test

(1) We have dune variant a and b (in a.opam and b.opam).
(2) The package a has a default implementation a.a depending on a non-existing package magic. The other implementation, a.b, does not have this dependency.
(3) The package b has a default implementation b.a - which depends on a. The implementation b.b does not depend on a.
(4) The executable depends on a.b and b.

In (4), my expectation is that since I chose a.b, this will be used (and not the default implementation (a.a) while building b). Does my expectation make sense?

Here, if you dune build exe/main.exe, I see the following error:

File "a/impl_a/dune", line 4, characters 13-18:
4 |   (libraries magic)
                 ^^^^^
Error: Library "magic" not found.
-> required by library "a.a" in _build/default/a/impl_a
-> required by library "b.a" in _build/default/b/impl_a
-> required by executable main in exe/dune:2
-> required by _build/default/exe/main.exe

I have to admit, at the end of the day, I observed the following intuition:

  • variants in mirage-net exist (and a default implementation)
  • I explicitly selected mirage-net.solo5 in the executable dune file
  • another package, also being a virtual library, depends on mirage-net
  • and here it made a difference whether the interface had the (libraries mirage-net) (virtual_modules a) dependency (which is the case that works);
  • or the implementation (being used) had the (libraries mirage-net) (implements a) dependency. in this case, the default_implementation from mirage-net was used, and not the mirage-net.solo5 as specified in the dune file of the executable.

Unfortunately today I was not able to reconstruct this test case (yet?).

@rgrinberg
Copy link
Member

Managed to confirm your repro. The issue is that since default implementations must live in the same package as the virtual libraries, we need to resolve more eagerly than normal libraries. Thus the error that the magic library is missing is coming from that check. Even though you don't need the default implementation as you're correctly overriding it with something else.

I suppose we could improve this check not to require the dependencies of the default implementation library to be resolved to check what package it belongs to. As you've alluded though, this is unlikely to be related to your original issue.

@hannesm
Copy link
Member Author

hannesm commented May 6, 2024

Dear Rudi, thanks for looking into that. If you could point me to a branch where the "check not to require the dependencies of the default implementation library to be resolved" is done, I'd be happy to try this out -- since to me it sounds it may be the same issue...

@rgrinberg
Copy link
Member

You can try the following patch as a start:

diff --git a/src/dune_rules/lib.ml b/src/dune_rules/lib.ml
index 5d24a3194..1fb1a6beb 100644
--- a/src/dune_rules/lib.ml
+++ b/src/dune_rules/lib.ml
@@ -1813,18 +1813,7 @@ module Compile = struct
     }
 
   let for_lib ~allow_overlaps db (t : lib) =
-    let requires =
-      (* This makes sure that the default implementation belongs to the same
-         package before we build the virtual library *)
-      let* () =
-        match t.default_implementation with
-        | None -> Resolve.Memo.return ()
-        | Some i ->
-          let+ (_ : lib) = Memo.Lazy.force i in
-          ()
-      in
-      Memo.return t.requires
-    in
+    let requires = Memo.return t.requires in
     let requires_link =
       let db = Option.some_if (not allow_overlaps) db in
       Memo.lazy_ (fun () ->

This removes the check altogether.

@hannesm
Copy link
Member Author

hannesm commented May 19, 2024

Back to this issue, I managed to find a smaller reproduction case -- please let me know if this is still too much.

The repository at https://github.com/hannesm/dune-variant reproduces the initial issue reported here. It would be great if you can take a look at that.

What virtual_modules are used?

  • mirage-bootvar has a virtual module, where mirage-bootvar.solo5 (referred to in dune.build) depends on mirage-solo5.
  • mirage-time has a virtual module, where mirage-time.solo5 is the implementation referred to in dune.build
  • mirage-solo5 depends on mirage-time

@rgrinberg
Copy link
Member

I tried understanding the example, but there's still too many dependencies for me to really understand what's happening. I can at least start that the following error:

File "duniverse/mirage-time/unix/dune", line 5, characters 12-20:
5 |  (libraries lwt.unix duration))
                ^^^^^^^^
Error: Library "lwt.unix" not found.

Is caused by the cross compilation setup. The solo5 toolchain defined in this workspace file does not contain the lwt.unix library. I suppose that makes sense because solo5 excludes the unix library? Is that right?

@hannesm
Copy link
Member Author

hannesm commented May 21, 2024

Dear @rgrinberg, thanks for looking into it.

The solo5 toolchain defined in this workspace file does not contain the lwt.unix library. I suppose that makes sense because solo5 excludes the unix library? Is that right?

This is correct.

there's still too many dependencies for me to really understand what's happening

What can we do about it? If it helps, we could have a (video)chat about it?

My intuition (if this helps)

From my perspective, the following should happen:
In dune.build we select the variants mirage-bootvar.solo5 and mirage-time.solo5. My intuition was that this should lead these variants being used for the build of the executable.

And these variants are marked as optional -- but their dependencies are present (and even mentioned in the dune.build again -- there's mirage-solo5). So, I don't quite understand why the issue appears:

File "dune.build", line 13, characters 40-60:
13 |  (libraries duration lwt mirage-bootvar mirage-bootvar.solo5
                                             ^^^^^^^^^^^^^^^^^^^^
Error: Library "mirage-bootvar.solo5" in
_build/solo5/duniverse/mirage-bootvar/solo5 is hidden (optional with
unavailable dependencies).

My intuition about "unavailable dependencies" doesn't seem to be accurate -- otherwise dune wouldn't consider mirage-solo5 to be unavailable. Its build depends on mirage-time (which in dune.build selects mirage-time.solo5 -- but that doesn't seem to be honored).

In any case, how should we proceed? Would a screen shared session to debug it further help? Would a videochat help? I tried the approach to start from 0 (in the earlier repository) trying to reproduce the issue, but wasn't successful with that. I also was not successful to reproduce without the cross-compilation setup (but I can try again). Maybe others (who know more about dune and mirage -- @samoht @TheLortex ?) could try to minimize the repository?

@rgrinberg
Copy link
Member

I think I managed to reproduce the issue here #10580

rgrinberg added a commit that referenced this issue May 25, 2024
When compiling an implementation of a virtual library, there's a check
that makes sure we don't the virtual library doesn't exist in the
closure of the implementation.

This check tried to compute the linking closure of the library to do so.
However, the linking closure might not be complete if the implementation
contains other virtual library.

To fix the issue, we use a "partial" linking closure that tries to
compute the closure as much as possible, but doesn't fail on missing
implementation.

Fix #10460

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>

<!-- ps-id: eed8cded-17bc-425f-95b8-3e0337e2881d -->
@rgrinberg
Copy link
Member

You can try the fix at #10581

@hannesm
Copy link
Member Author

hannesm commented May 25, 2024

Thanks @rgrinberg, I tested your PR locally, and it solves the issue I reported here. It would be amazing to have that merged and released soon (since it will allow me to push forward with some other work) :D 🚀

@rgrinberg rgrinberg added this to the 3.16.0 milestone May 25, 2024
rgrinberg added a commit that referenced this issue May 26, 2024
When compiling an implementation of a virtual library, there's a check
that makes sure we don't the virtual library doesn't exist in the
closure of the implementation.

This check tried to compute the linking closure of the library to do so.
However, the linking closure might not be complete if the implementation
contains other virtual library.

To fix the issue, we use a "partial" linking closure that tries to
compute the closure as much as possible, but doesn't fail on missing
implementation.

Fix #10460

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants