Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Write Lua parser and bytecode compiler in Lua #248

Open
lukego opened this issue Mar 25, 2019 · 19 comments
Open

Idea: Write Lua parser and bytecode compiler in Lua #248

lukego opened this issue Mar 25, 2019 · 19 comments

Comments

@lukego
Copy link
Contributor

lukego commented Mar 25, 2019

This issue is to "import" and idea that I really like from the LuaJIT repo: LuaJIT/LuaJIT#488

The Lua bytecode compiler is currently written in C but it does strike me that writing this in Lua instead would reduce overall complexity and make the language frontend easier to maintain and evolve. I don't see much of a bootstrapping problem since we could embed the precompiled bytecodes into the VM.

Cool idea @SoniEx2 :-)

@SoniEx2
Copy link

SoniEx2 commented Mar 25, 2019

I kinda opened that issue so I could ship my LOVE code with a patched LuaJIT parser/compiler. it would only handle LuaJIT bytecode but I'm okay with that tbh.

I don't think anyone builds their LOVE with RaptorJIT instead of LuaJIT. so I wouldn't get much (if any) benefit from such a pure-Lua RaptorJIT parser/compiler.

@lukego
Copy link
Contributor Author

lukego commented Mar 26, 2019

@SoniEx2 This idea is still very interesting for RaptorJIT since we want to reduce the maintenance burden of the C code and evolve the language over time.

I don't think it hurts to communicate between forks. If we implemented this on RaptorJIT you could probably port it pretty easily to your own LuaJIT fork and deploy on that. This is the way things work in the *BSD universe where each fork develops its own features and the best ones are picked up by the others over time.

@Ravenslofty
Copy link
Contributor

For what it's worth, I'm torn on this.

"Embedding the binary blob" is not a trivial task here - I think it would involve linker scripts or being dumped into a C array for the preprocessor.

Additionally, internal bytecode is not a stable format, so the front-ends would require updating as the bytecode format evolves.

However, a rewrite to achieve this would majorly decouple the JIT from the front-end, and I think that's a goal worth pursuing in its own right; other languages could emit RJ bytecode to run on the RJ backend.

So I suppose my opinion can be summarised as "I think this is a bad idea but an excellent concept".

@SoniEx2
Copy link

SoniEx2 commented Apr 3, 2019

What? I'm just suggesting we write lua_load in pure Lua - but you need to bootstrap it so lua ./compiler.lua ./compiler.lua | into_c_array > compiler.c and run it directly on the VM. it'd be part of the binary itself so "the bytecode isn't stable" is fine as you'd need to update the compiler anyway when making bytecode changes.

@Ravenslofty
Copy link
Contributor

Ravenslofty commented Apr 3, 2019

What? I'm just suggesting we write lua_load in pure Lua

So, suppose somebody else comes along and thinks "this is a good runtime, let's write my language for the RaptorJIT VM". Then what?

but you need to bootstrap it so lua ./compiler.lua ./compiler.lua | into_c_array > compiler.c and run it directly on the VM.

So then you need APIs to load this bytecode into the VM. Granted, there is load as part of the existing Lua API, but I still consider that data to be internal to the VM.

it'd be part of the binary itself so "the bytecode isn't stable" is fine as you'd need to update the compiler anyway when making bytecode changes.

Does arbitrary bytecode somehow become more trustworthy when embedded into the executable?

@SoniEx2
Copy link

SoniEx2 commented Apr 3, 2019

it's no more arbitrary than the machine code around it.

what's your C compiler written in?

the lua_load bytecode would be internal to the VM. luajit already does this with a handful of builtin functions (they're interpreted and JITted like any lua function, there's little to no C around it) altho I don't fully understand how it works.

@Ravenslofty
Copy link
Contributor

it's no more arbitrary than the machine code around it.

And no more trustworthy, honestly.

what's your C compiler written in?

C++, because C is a bad language to write C compilers in.

the lua_load bytecode would be internal to the VM. luajit already does this with a handful of builtin functions (they're interpreted and JITted like any lua function, there's little to no C around it) altho I don't fully understand how it works.

This is provably false.

for i=0,100 do
    local f = assert(load[[print("Hello, World!")]])
    f()
end
---- TRACE 1 start foo.lua:1
0005  GGET     4   0      ; "assert"
0006  GGET     5   1      ; "load"
0007  KSTR     6   2      ; "print("Hello, World!")"
0008  CALL     5   0   2
0000  . FUNCC               ; load
---- TRACE 1 IR
0001    int SLOAD  #1    CI
0002    fun SLOAD  #0    R
0003    tab FLOAD  0002  func.env
0004    int FLOAD  0003  tab.hmask
0005 >  int EQ     0004  +63
0006    p32 FLOAD  0003  tab.node
0007 >  p32 HREFK  0006  "assert" @3
0008 >  fun HLOAD  0007
0009 >  p32 HREFK  0006  "load" @55
0010 >  fun HLOAD  0009
0011 >  fun EQ     0010  load
0012    num CONV   0001  num.int

The FUNCC opcode indicates it's a C function.

I haven't looked around, but I would imagine it translates to a call to lua_load.

@SoniEx2
Copy link

SoniEx2 commented Apr 3, 2019

try math.max math.rad.

take a look here: https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/host/genlibbc.lua

@lukego
Copy link
Contributor Author

lukego commented Apr 4, 2019

I'd like to understand how the current bootstrap uses embedded bytecodes (if at all.)

I was going to insist that we already do generate and embed bytecodes during bootstrap. There is a bootstrap module genlibbc.lua mentioned by @SoniEx2 that byte-compiles Lua code embedded in the C sources like this:

LJLIB_LUA(table_foreach) /*
  function(t, f)
    CHECK_tab(t)
    CHECK_func(f)
    for k, v in PAIRS(t) do
      local r = f(k, v)
      if r ~= nil then return r end
    end
  end
*/

... but where do those bytecodes actually end up? I had expected to find them in src/reusevm/ where we have an in-tree copy of all the generated bootstrap artefacts but I don't immediately find them there. Could it be that this Lua-in-C embedding is dead code e.g. that was only used for minilua? If so that sounds like a potential target for the @ZirconiumX dead-code-destruction spree 😀

@SoniEx2
Copy link

SoniEx2 commented Apr 4, 2019

@lukego
Copy link
Contributor Author

lukego commented Apr 4, 2019

@SoniEx2 That's one breadcrumb but buildvm_libbc.h is only directly included by other bootstrapping code. How do those library function bytecodes end up in the executable?

@SoniEx2
Copy link

SoniEx2 commented Apr 4, 2019

the makefile generates an lj_bcdef.h

@lukego
Copy link
Contributor Author

lukego commented Apr 4, 2019

@SoniEx2 So is the Lua code embedded in C comments first bytecode compiled by genlibc.lua into buildvm_libbc.h which is then used to generate lj_bcdef.h containing these same bytecodes which is then included in the VM?

I ask this question both sincerely (I don't know the answer and would prefer to thrash it out collaboratively rather than silently work it out myself) and also rhetorically (if it's really this complex what comment could we put into the code to make it clearer to the next saps who want to understand it?)

@SoniEx2
Copy link

SoniEx2 commented Apr 4, 2019

I think so. I don't exactly think that should be in the code, tho. instead it should be in a separate file.

@lukego
Copy link
Contributor Author

lukego commented Apr 4, 2019

I created an issue at #250 so that hopefully the next time somebody goes down this build-and-bootstrap rabbit hole they could capture that into a readme of some kind.

@hippi777
Copy link

hippi777 commented Apr 4, 2019

hi all! :)

edit: i didnt even see ur previous message, cuz i opened this page much earlier...

i think the best for such descriptions would be best in a form like a recipe in order, like this generates that and that will be used by whatever for whatever purpose and so on... so one can gasp the whole pic from the perspective of a bird, see the "wires", and knows how things are assembled/working and where to go for a given purpose. otherwise i hate it when the source is hard to follow, but some hints/pointers to follow are enough for demistification in general. ((this latter applies to oop so badly, its a horror in general to track down the flow, even if basically it looks like a pretty library with every book on its well-defined place til its not about runtime, cuz its horribly fragmented.)) so the tldr: its all about showing the flow and giving the pointers whenever they are necessary to see the path.

„Everything should be made as simple as possible, but not simpler.” - A. E. :) so the rest is about explanation. :D

all the bests for all of u! :)

@lukego
Copy link
Contributor Author

lukego commented Apr 7, 2019

Great tip from @CapsAdmin: There is already an implementation of the LuaJIT parser and bytecode compiler in Lua! Could we use this to retire some C code in RaptorJIT? https://github.com/franko/luajit-lang-toolkit

@exikyut
Copy link

exikyut commented Nov 27, 2019

Hey,

I stumbled on this issue and it got me thinking, it could go in some interesting directions. I wrote a bit over at LuaJIT/LuaJIT#488.

@lukego you mentioned you were really interested in the idea, and I'm not sure if GH creates pings when issues mention other issues - so, just making absolutely sure. :)

I don't mind which thread the replies/discussion end up in. RJ and LJ have complimentary objectives.

@bubach
Copy link

bubach commented Jul 11, 2020

Sorry if I missed something, didn't read the full thread to be honest.. These projects might be of interest for redoing the bootstrapping part, might save you the trouble of reinventing a wheel or two:

https://github.com/leegao/LuaInLua
https://github.com/davidm/lua2c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants