Skip to content
This repository has been archived by the owner on Aug 25, 2022. It is now read-only.

Generate smaller, simpler code #104

Closed
wants to merge 9 commits into from

Conversation

yallop
Copy link
Member

@yallop yallop commented Feb 15, 2017

This PR makes a few changes to the code generated by functoria to make it smaller and simpler. The aims are readability and simplicity, not performance, but there might be some small performance improvements.

The typical space savings are around 40-45%. For example,

  • the main.ml file for conduit_server drops from 199 lines to 115 lines
  • the main.ml file for console drops from 80 lines to 43 lines

The code is simplified/shrunk in three main ways:

  1. First, there's a distinction between between effectful and non-effectful initialization code in the return type of connect. (This is perhaps a first step in the direction of generating code at a slightly higher level than string concatenation.)

    The implementer of connect tags the generated code with Val or Eff according to whether it's a simple value or a possibly-effectful computation, and the caller of connect uses the tag to determine whether the value can be safely inlined. For effectful computations the generated code follows the previous scheme: a top-level binding is emitted

    let name = lazy (connect_code ())

    which is used in connect code for other modules:

    name >>= fun _name ->
    ...
    return (f ... _name ...)

    For effect-free computations the top-level binding is omitted and the value is instead inlined directly:

    ...
    return (f ... connect_value ...)
  2. Second, the code that forces lazy values and the code that executes the resulting Lwt computations are combined, avoiding an intermediate binding. Before:

    let __foo = Lazy.force foo in
    __foo >>= fun _foo ->
    ...

    After:

    Lazy.force foo >>= fun _foo ->
    ...

    (This changes evaluation order slightly, so it would be a good idea for someone to confirm that the change is safe.)

  3. Finally, the layout is now a little more compact. For example, short expressions now occupy a single line. Before:

    let foo = lazy (
      M.connect ()
     )

    After:

    let foo = lazy ( M.connect ())

@yallop
Copy link
Member Author

yallop commented Feb 15, 2017

It may be wise to leave this unmerged until Mirage 3 is safely released.

@Drup
Copy link
Member

Drup commented Feb 15, 2017

Just so that I can get a quick look before reinstalling half my mirage switch: Could you pastebin before/after for some medium-sized unikernel ?

@yallop
Copy link
Member Author

yallop commented Feb 15, 2017

Here's a before/after gist for clock.

@Drup
Copy link
Member

Drup commented Feb 15, 2017

Ok, Change 2 is incorrect because it doesn't launch the sub-connect concurrently, it makes them sequential.

Here is the original version. First, all the tasks are forced, which make them execute concurrently, then they are lwt-bound.

let f11 = lazy (
  let __time1 = Lazy.force time1 in
  let __pclock1 = Lazy.force pclock1 in
  let __mclock1 = Lazy.force mclock1 in
  __time1 >>= fun _time1 ->
  __pclock1 >>= fun _pclock1 ->
  __mclock1 >>= fun _mclock1 ->
  Unikernel1.start _time1 _pclock1 _mclock1
)

Here is the new version, the tasks are forced and lwt-bound immediately, in a sequential order.

let f11 = lazy ( Lazy.force pclock1  >>= fun _pclock1 -> 
 Lazy.force mclock1  >>= fun _mclock1 -> 
Unikernel1.start () _pclock1 _mclock1)

@Drup
Copy link
Member

Drup commented Feb 15, 2017

(To see why this is important more concretely: consider the case where you have two main devices that both launch servers listening on different ports. The "main" Mirage device will connect on both. It must do so concurrently)

@yallop
Copy link
Member Author

yallop commented Feb 15, 2017

Thanks for taking a look at this, @Drup.

Could you be even more concrete? e.g. with a pointer to some code.

The code that functoria currently generates looks quite sequential already, since the Lwt computations are sequenced with >>=. But perhaps there's some additional concurrency introduced when the lazy values are forced.

@Drup
Copy link
Member

Drup commented Feb 15, 2017

@yallop >>= does not launch the lwt computation on the left hand side, it only waits for it. The actual promise is launched when we Lazy.force it. I'll see if I can come up with a unikernel that exercises this feature.

@yallop
Copy link
Member Author

yallop commented Feb 15, 2017

It's ok; I see what you mean. I'll take another look at combining forcing and binding.

@yallop
Copy link
Member Author

yallop commented Feb 15, 2017

Having looked at this a bit more, I think an applicative approach could work well here.

We could generate something like this

pure Unikernel1.start <$> lazy () <$> pclock1 <$> mclock1

in place of the current

let f11 = lazy (
  let __time1 = Lazy.force time1 in
  let __pclock1 = Lazy.force pclock1 in
  let __mclock1 = Lazy.force mclock1 in
  __time1 >>= fun _time1 ->
  __pclock1 >>= fun _pclock1 ->
  __mclock1 >>= fun _mclock1 ->
  Unikernel1.start _time1 _pclock1 _mclock1
)

With some care it should be possible to arrange to run the applicative so that all the laziness is forced before blocking.

@Drup
Copy link
Member

Drup commented Feb 15, 2017

@yallop I agree. The code is made to precisely emulate the applicative behavior. :)
The problem is how to make that easy to define at the device level.

Currently, functoria asks for the dsl to provide monad-like operator, but that monad doesn't include the lazyness aspect. I guess we could touch that a bit ?

@avsm
Copy link
Member

avsm commented Feb 27, 2017

I think an applicative approach could work well here.

That would be lovely alongside the other code like the cmdliner exposure which has similar idioms.

@yallop
Copy link
Member Author

yallop commented Mar 7, 2017

@Drup: is there a reason why the "main" code forces values sequentially? Is it important to preserve that behaviour?

@talex5
Copy link
Contributor

talex5 commented Mar 7, 2017

@yallop I think that is the list of special init tasks. The sequence seems to be:

  • Parse arguments
  • Set up logging (depends on the arguments)
  • Set up everything else (may use logging)

@yallop
Copy link
Member Author

yallop commented Mar 7, 2017

Thanks, @talex5. Do you think it would be worth making the dependencies explicit in the data flow? i.e. returning a value from each task that can be used as input to subsequent tasks?

@talex5
Copy link
Contributor

talex5 commented Mar 7, 2017

I originally made a PR in which init tasks were just normal jobs with dependencies, but it was felt that this list-based system was simpler and good enough.

@Drup
Copy link
Member

Drup commented Mar 7, 2017

The problem with making that part of the data flow explicit is that suddenly, everyone must depend on the key device. People must remember to declare dependency towards the key device (and, spoiler, they won't). The graph becomes mostly unreadable and you get some other annoying issues, and we don't even really gain anything.

From an abstract point of view, I would really like it, it would be much more elegant. Currently, there is an ugly hack to find back the key device in the graph and do the init properly. It's a very practical approach. If you have a concrete proposition that doesn't make everything else worse, I'm all hear.

@yallop
Copy link
Member Author

yallop commented Mar 7, 2017

Thanks, @Drup. That all makes sense. Surfacing the dependencies is valuable, but I can see how it could lead to a tangle. I'll give it some thought.

In the meantime, I'll pull some of the less controversial changes here into a separate pull request.

@yallop
Copy link
Member Author

yallop commented Apr 19, 2017

Here's an outline of a possible approach that leaves the device code ignorant of laziness. The idea is to use an explicit representation of the forced computation to ensure that all forcing of lazy values occurs before all use of Lwt.(>>=).

First, the code for each device is a closed function expression:

(fun x y z ->
 Foo.connect x y z)

A function, lift, turns these expressions into delayed computations (where t is abstract):

val lift : 'a -> 'a t Lazy.t

A second function, <$>, applies the lifted function to delayed Lwt computations:

val ( <$> ) : ('a -> 'b) t Lazy.t -> 'a Lwt.t Lazy.t -> 'b t Lazy.t

Finally, run runs a fully applied computation:

val run : 'a Lwt.t t Lazy.t -> 'a

And here's a simple implementation:

type _ t =
     V : 'a -> 'a t
   | App : ('a -> 'b) t * 'a Lwt.t -> 'b t

let lift v = lazy (V v)

let rec run' : 'b 'c. 'b t -> ('b -> 'c Lwt.t) -> 'c Lwt.t =
  fun e k -> match e with
    | V v -> k v
    | App (f, v) -> run' f (fun g -> v >>= fun w -> k (g w))

let run v = Lwt_main.run (run' (Lazy.force v) (fun x -> x))

let (<$>) f x = lazy (App (Lazy.force f, Lazy.force x))

You might then write code like this:

let p = lazy (P.connect ())
let m = lazy (M.connect ())

let d = run (lift connect <$> p <$> m <$> m)

The tricky part is ensuring that the thunks are forced before any binding takes place. Representing the computation using t, V and App takes care of that, since that structure and its evaluation function run' know only about Lwt, not about lazy. It's not applicative, exactly, but it's applicative-style.

What do you think, @Drup?

@samoht
Copy link
Member

samoht commented Jul 4, 2017

ping @Drup

@Drup
Copy link
Member

Drup commented Jul 4, 2017

Sorry for the delay!

@yallop I believe this is a good idea! Your API looks good. I wonder how generic we can make it.
One alternative API would be to have a multi argument apply (with an heterogeneous list) and have an heterogeneous Lwt.join. This is pretty much the same anyway (the <$> are the cons) but it makes the fact that we want the join semantics a bit clearer.

@avsm
Copy link
Member

avsm commented Feb 2, 2018

This one seems relevant to resurrect again :-)

@Drup
Copy link
Member

Drup commented Feb 2, 2018

Indeed!
@yallop Do you want to take a shot at implementing your last proposition? We could also take that occasion to revisit what we need from the underlying datatype. Currently it's a monad, but maybe we don't need that much.

@hannesm
Copy link
Member

hannesm commented Dec 22, 2018

sorry for being late to the party, but I'm wondering what the status of this PR is? it looks like this would be a good improvement for functoria.

AFAICT @yallop most recent suggestion from #104 (comment) to use an explicit representation of the forced computation is not yet implemented -- @yallop any chance you have some time to develop this code and rebase this PR on master? I can rebase the accompanied PR in mirage if you like.

@Drup
Copy link
Member

Drup commented Dec 22, 2018

Actually, the last set of patches implement the idea of separating values and computations. The code looks decent at a glance, but it breaks everything single device in existence. :/

@hannesm
Copy link
Member

hannesm commented Dec 22, 2018

@Drup I searched through GitHub for mentions of base_configurable (which catches most definitions of custom devices), and most were in mirage/mirage and mirage/mirage-skeleton, a few distributed around other example unikernels and your lua. I think it is manageable to break the API and update all clients.

@yallop
Copy link
Member Author

yallop commented Feb 18, 2019

I've rebased against master, and I plan to revisit this code to bring it into line with the design discussed above.

@hannesm
Copy link
Member

hannesm commented Feb 28, 2019

thanks @yallop, if you get around doing that, that'd be great. as mentioned earlier, I'm happy to rebase your mirage/mirage#790 PR once this is settled :D

@samoht
Copy link
Member

samoht commented Dec 5, 2019

It would be nice to merge that PR at one point, as simpler code is always nicer :-)

@yallop do you think you could update these patches to make it inline with the above discussion?

@samoht
Copy link
Member

samoht commented Oct 7, 2021

The functoria codebase has changed a little bit, but this is still a very much wanted code generator simplification. @yallop if you are still interested to implement these changes, I'll be super happy to merge them :-)

@samoht
Copy link
Member

samoht commented Oct 9, 2021

Actually this repository is not active anymore, so please feel free to move your patch to mirage/mirage !

@samoht samoht closed this Oct 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants