Skip to content
Eliot Miranda edited this page Nov 3, 2020 · 11 revisions

Welcome to the opensmalltalk-vm wiki!

The opensmalltalk-vm is a VM for Smalltalk and related languages. It is developed in Smalltalk. This repository holds the generated source from the Smalltalk development environment, the platform support sources, build directories, and source of the CI infrastructure that flesh out the pure Smalltalk development environment to produce a production-quality real-world Smalltalk VM. You can read more information on the full ecosystem at www.squeak.org.research (see the papers https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2841 / https://dl.acm.org/doi/10.1145/3281287.3281295, https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2841, https://dl.acm.org/doi/10.1145/3132190.3132201, https://dl.acm.org/doi/10.1145/3139903.3139911, https://dl.acm.org/doi/10.1145/2887746.2754186). The Smalltalk source lives at http://source.squeak.org/@Lek8fo6SQy1Y4Mhh/iVRcogFI in package VMMaker-oscog. Scripts in the image directory in this repository build a Smalltalk development environment for the tip of VMMaker-oscog.

These are the most important medium term ideas for what we should be working on.

  • productizing Sista; Sista is a Smalltalk take on speculative-inlining/adaptive optimizing, keeping the optimizer above the VM in line with Smalltalk tradition. The prototype has been evaluated, written up and is definitely viable. Resources have prevented full productization. This system should offer of the order of 3x to 4x performance increase with more to come when extended. This is a really challenging project.
  • productizing the Threaded FFI; the Threaded FFI is an implementation of David Simmons' lock-free VM sharing architecture, where any thread can run the VM but only one thread can do so at any one time (multi-threaded but not in parallel). The prototype was functional in 2010 but again resources have prevented its productization. Since 2010 the arrival of Spur has provided facilities suited to the Threaded FFI, notably pinning.
  • extending Spur with an incremental mark-sweep collector and an incremental compactor; Spur is the most advanced memory manager/object representation for Smalltalk ever. It provides fast become that scales to gigabyte heaps while retaining direct pointers (no object table indirection per object; object pointers point directly to objects which have a conventional compact contiguous header-followed-by-slots layout). This is done by reusing the dynamic message send lookup machinery to identify forwarding pointers, snd the context-to-stack-mapping machinery's stack zone, to ensure that within an activation no read barriers are required to access an object, while objects are becalmed by turning them into forwarding pointers to copies, scanning only the very small stack zone to eliminate forwarding pointers. This scheme has been extended to do incremental compaction in multi-gigabyte heaps. The incremental scan-mark collector to accompany the incremental collector still needs implementing (design sketches exist). So again the project involves productizing a reasonably well-understood architecture.
  • augmenting the simulator framework that is used to develop the VM into a system capable of interacting with real VM debuggers so that the simulator may be used to debug the real VM. Boris Shingarov is leading this effort.

Here are some short-term projects that would be great to have and might be a good way of getting familiar with the code base without having to take on too big a project.

  • The simulator has a growing set of inspectors that display the state of simulation and allow click-on-field access to other inspectors; so far we have
    -- a stack frame inspector, displaying receiver, method and local variable oops and classes
    -- a processor inspector, displaying register contents and abstract names and the current instruction
    -- a bytecode method inspector showing bytecodes
    It would be great to have a JITted method inspector. Even better would be to extend the framework to handle modified fields generically. Currently the processor inspector highlights the registers that changed value since the last update (either as the result of a single step, a run of code, or as an explicit assignment to the processor). This idea of updating fields when they change could also apply to the frame inspector and a JITed code inspector (since code is modified in maintaining in-line caches, in updating object pointers embedded in machine code during garbage collection, etc). So a general inspector framework which managed monitoring the underlying data structure (be it memory or a processor simulator, etc) to automate highlighting on change would make adding new inspectors easier and richer.

  • Cogit, the JIT, has substantial "disassembly decoration" facilities that make it much easier to read the generated machine code than in usual development environments. To display JITted code, machine code is disassembled into assembler and then the assembly is parsed to extract addresses, register names, frame offsets and the like, which get added to the assembler as inserted names. The parsing of assembler is completely ad hoc and increasingly messy. It would be lovely to have a general tokenizing parser that dealt with all assembler syntaxes and radically simplified the decoration code, which is a very useful but sprawling mess.

If you are interested in any of these projects and want to know how to get started please drop into the OpenSmalltalk-VM mailing list (http://lists.squeakfoundation.org/mailman/listinfo/vm-dev) and we can help get you started.

Clone this wiki locally