Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mist memory model for Zag #2

Open
Shaping opened this issue Jul 8, 2022 · 12 comments
Open

Mist memory model for Zag #2

Shaping opened this issue Jul 8, 2022 · 12 comments

Comments

@Shaping
Copy link

Shaping commented Jul 8, 2022

Re: https://www.youtube.com/watch?v=iw4FPqe8KdY&t=1093s 22:05 to 36:05

The above section of Martin McClure's Mist video describes a very simple, fast, exponential memory model involving binary-powers-of-2-sized bins for allocating objects. We need not lift this GC mechanism into the Smalltalk itself, at this early stage, as he is proposing for Mist. We could instead implement this GC design as C primitives on the VM side. In this way, objects on the heap will never move. This fact alone will increase execution speed and create more deterministic latencies.

Thoughts anyone?

@dvmason
Copy link
Owner

dvmason commented Jul 9, 2022

Thanks for your thoughts! This is very much a work in progress, but the documentation has some pieces on this (although not well organized).
The current plan is:

  1. each execution thread has its own generational, copying collector with a couple of arenas
  2. there is a global space which is non-moving, and organized as a Fibonacci allocation - similar to Martin's binary allocator (but with less internal fragmentation) - there is a separate collector thread which manages this, interacting with the execution threads to find roots
    I did like Martin's idea of having large objects mmap/munmap-ed - I need to think on this a while.

@Shaping
Copy link
Author

Shaping commented Jul 10, 2022

(I just realized that I should have said "Zig" instead of "C" primitives" above.)

Yes, agreed, regarding handling of large objects via mmap. He also seemed to say that the binary allocation strategy may need to be replaced by reference counting. I'm not sure I follow that. He seems to be saying that merely flipping the marked/unmarked bit for each fixed-size block of bytes in each bin won't scale. Not sure. It looks pretty fast even if the bins are large.

I've read through most of the online doc for Zag. I must have missed the part about memory management.

I'm waiting now 27 years for a performant Smalltalk. It may actually happen.

Are you doing this solo? Are your grad students helping?

Unison (https://www.unison-lang.org/) looks like the future of programming language, at least in terms of how the code is stored and versioned (AST is hashed; no source is stored). Function names are attached to hashes, and renaming is easy. Nothing breaks via renaming. The codebase always works, and experiments/forks are very easy. Zag is already very AST-centric. We could steer Zag in this direction. Have you considered this?

I like Unison so much that I'm willing to tolerate the much less interesting, non-Smalltalk, non-keyword, non-anglogrammatical syntax, long enough perhaps to change the Unison parser (written in Haskell) to "support" or at least fake-out a message-sending like syntax. I prefer a functional/declarative domain-level grammar. So I can either morph Unison to be more like Smalltalk, or morph Smalltalk to use hashing broadly. Zag looks like a great place to do that. I'm trying to get rid of the VM (as Martin is suggesting) and make memory management very simple and fast (I don't mind some wasted memory).

Can I currently play with a rough version of Zag working on some version of Pharo? Are you hosting Zag development in Pharo , or just trying to parallel Pharo class structure? Is Zag just the VM implemented in Zig and compatible with Pharo images, or will it have its own basic Smalltalk class hierarchy replacing similar counterparts in Pharo?

I'm currently stuck in VW 7.10/8.3. Zag seems like a good reason to port to Pharo.

@dvmason
Copy link
Owner

dvmason commented Jul 11, 2022

Unfortunately, I am doing this mostly solo at the moment. I have a grad student starting in September, but it will be a while before they are up to speed.

See https://github.com/dvmason/Zag-Smalltalk/blob/main/Documentation/Garbage_Collection.md - there's a few typos/unclarities, but the gist is there.

Unison does look interesting. I wish them well.

I'm supposed to give a talk at the UK Smalltalk Meetup (hopefully in September). I'm hoping to be able to export code to run on Zag by then.

@dvmason
Copy link
Owner

dvmason commented Jul 11, 2022

Also, note that Mist is very single-threaded... there aren't even any processes.

Whereas Zag is intended to be very multi-threaded... using OS threads, and it will do blocking I/O in I/O threads.

@Shaping
Copy link
Author

Shaping commented Jul 11, 2022

Yes, we want parallelizing of algorithms, which is very easy to do in a purely functional language (Unison, namely). Unison easily parallelizes across cores/OS-processes and machine nodes. The big idea here is that is can economically move the code (always via hashed AST) to the data sitting on another node, instead of moving potentially very large data around to where the code runs. This code syncing need happen only once; then it's cached on the other node(s).

They've designed the so called Unison abilities to handle algebraic effects. The separation of the functional chain and effects (side-effects) is clean, but the syntax could be somewhat more approachable. I don't want to trade accurate, low-effort readability for the transparent, easy parallelizing. We should be able declare a parallelization of behavior across any scope (core, CPU, machine node) in a similarly compact, clear, and graceful way in Zag. To do the machine-node phase of that parallelizing will need the code-hashing scheme to be efficient.

So Zag will be able to create more than one thread per OS process. I hope Zag syntax for parallelizing stays simple, and just makes it work in the most efficient way possible.

@Shaping
Copy link
Author

Shaping commented Jul 12, 2022

Why do most GC designs by default include operations to copy and move? The binary-power-sized bins strategy involves no copies or moves. The memory blocks in each bin are quickly accessed via a linked list. A bit is flipped to indicate allocated for use by an object or not.

If wasted memory is a concern, recursively apply the binary-power-sized bin structure to the leftover portion of each memory block, taking care to collect for the secondary (or n-ary) bin linked list only those blocks whose leftovers are similar in size by some predefined percentage. One or two nestings of this pattern will leave very little bin-block memory unused/inaccessible. I don't see a good reason for the current complexity and slowness of popular GC algos. Someone please explain how the binary-power-sized bins strategy fails to be fast and scalable.

@dvmason
Copy link
Owner

dvmason commented Jul 12, 2022

Copying collectors are much faster if you mostly have garbage. Every BlockClosure or temporary array you create almost instantly becomes garbage, so a typical minor collect might be only 10-20% live data. Even more importantly, allocations are practically free, just bump a pointer. They are also very cache friendly. This means they are ideal for a per-thread arena, where no locks are required. This is why Zag uses this for per-thread arenas.
If you have an arena that is accessible to multiple threads, then moving becomes a big deal - you'd have to stop all threads to move anything, and you can't collect in parallel. So here, Zag uses a mark and sweep collector that doesn't move anything once allocated. Any allocation here requires a lock, but is otherwise very fast. Zag uses a similar allocation scheme to Mist - just using Fibonacci numbers instead of powers of 2. This means almost no space is wasted, versus with powers-of-2 (like Mist), on average 1/4 of memory is wasted. Free space is easily coalesced in the sweep phase.
I'm going to add some of this rationale to the Garbage-Collection pages in the documentation.

@Shaping
Copy link
Author

Shaping commented Jul 12, 2022

Yes, the Fib size progression makes more sense than geometric.

Would like to see more detail on this and on how the arenas coordinate.

@Shaping
Copy link
Author

Shaping commented Jul 29, 2022

Dave, do you need any help with the Zig code? Are you satisfied with Zig's simplicity and superiority relative to regular crap C? Zig looks like a big improvement on C in terms of simplicity. Zig seems also to be somewhat faster.

Have you timed the GC copies you mentioned? I would like any new Smalltalk VM to be temporally introspective, by design, easing measurement of run-times for any evaluable/method/block or actor network performing a task. Are you building such an ability into the VM, or at least considering it?

I'm trying to include time measurements in all my SUnit tests, which is an unreliable exercise in the best cases, given how most GCs work. The measured times vary too much. This is why a spread of Fib-sized bins of known sizes is interesting. As much as we all want fast allocating and freeing, I also seek more deterministic, accurately measurable run-times with smaller variances. This helps when we program/test microcontrollers/IoT devices, where precise timing can be critical. Such timings are good to know for any app, even if you don't need the precision.

@Shaping
Copy link
Author

Shaping commented Jul 29, 2022

Is Zag intended to be compatible with the exiting Pharo image format?

@dvmason
Copy link
Owner

dvmason commented Aug 8, 2022

I've been updating some of the documentation, so looking at recent changes may be helpful. I've particularly updated some of the MemoryManagement, and have integrated Mist-like large-object allocation, so that now anything larger than about 4K words (32KiB) gets allocated its own pages for the content (great for reading in a whole file - it can simply be mmapped)

Zag Smalltalk is also getting extended to Zag Dynamic Runtime, as I've realized that I can easily support other dynamically-typed languages like Ruby and Python easily, but also Javascript, Self, and APL with a bit of work, and Scheme with a bit more work. This would support a lot more experimentation with other ideas about dynamic programming languages. It will provide:

  • immediate 64-bit tagged values supporting 51-bit integers, double-precision floating point, symbols, Unicode characters, booleans, nil/null with room for expansion;
  • full OS-thread support to exploit multi-code/many-core architectures;
  • a production-quality parallel/concurrent garbage collector and high-performance memory allocation;
  • efficient support for the "challenging bits" of the languages above including full closures, call-with-current-continuation, become:, allInstances, n-dimensional arrays;
  • efficient virtual method dispatch;
  • threaded, fully live-debuggable, execution model;
  • multi-language debugging (at least within a paradigm);
  • type-inference;
  • JIT code generation, probably based on LLVM;
  • exporting of executable programs to a range of platforms;
  • leveraging GPU execution engines.
    The goal is that a new language implementation would require only a parser and possibly tweaks to the code generation to capture idiosyncratic execution models. Much of the design for this model is done, and implementation has started on the basics. The full vision will take several years.

As for help, I'd love help! Right now I need to get a few more details nailed down, but implementation of primitives is a large, but highly granular task.

No, Zag will not he compatible with the existing Pharo image format, but Pharo is now built from pieces and it might be possible to build a Zag image from many of the same pieces, plus a few Zag-specific ones. Alternately a converter could take a OpenSmalltalk image and produce a Zag image.

@Shaping
Copy link
Author

Shaping commented Aug 23, 2022

It's a splendid feature list. I'm re-reading the docs.

Are we committed to being temporally accountable? There will be no robot-control or IoT without it. Can we build accurate run-time measurement into the VM, so that it's easy and compelling to use, without cluttering the code? Perhaps have a global switch for turning it on and off.

Strictly determined memory allocation/free times are needed. How narrow the variances can be is the issue. Accurately calculating memory-management latencies will be hard to impossible if the VM is copying variable-sized blocks of memory. Bins of Fib-sized blocks will have a range of sizes and therefore a range of link-list traversal times (larger bins have fewer blocks; smaller bins have more), but latency variance can be narrowed by measuring block-size and -count spreads for any given program across a range of conditions.

Why not have a separate heap bin for each class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants