Improved unboxing #99

melsman · 2022-01-15T14:46:49Z

The MLKit native backend uses a uniform representation of values, which is important for compiling generic code (e.g., functions) separately from the code that uses it. Still, under these constraints there are many possibilities for securing a compact data representation. For many years, the MLKit has used tagged regions for some types of values (instead of tagging the values themselves) and datatypes with a single unary constructor that takes boxed arguments have been implemented using the lower-bit tags in pointers to discriminate between the constructors. There are more opportunities:

Use the 16 most significant bits of pointers on 64-bit architectures for tagging. When we have a datatype $t$ with $u$ unary constructors and $n$ nullary constructors, if $0 ≤ u < 2^{16}$ and if (X) all arguments to the unary constructors are known to be boxed (or not using the high bits), we can represent the type $t$ unboxed, using the high bits for tagging. If $u=0$, we call the type $t$ an enum-unboxed type. We call the type $t$ a low-unboxed type if $u=1$ and the argument to the unary constructor is boxed, and a high-unboxed type if $u > 1$ and if (X) holds. In the worst case, we use a boxed representation of the datatype $t$. When $t$ is a high-unboxed type, we use the 16 most significant bits to discriminate between the unary and nullary constructors (for GC safety, we also set the least significant bit for the nullary constructors). We can use an iterative algorithm do determine the boxity for mutually recursive datatypes, which also include datatypes with a single unary constructor. (For region inference, the spreading algorithm assumes that for mutually recursive datatypes, either all the declared types are unboxed or they are all boxed; we may later relax this assumption.) PR Unboxing of data types using high-bit pointer tagging #149.
Use an analysis to separate simultaneously declared datatypes into strongly connected mutually recursive components. This optimisation is of particular importance when a datatype declaration declares both an enumeration type and a recursive datatype that cannot be boxed. The assumptions made by region inference leave the enumeration datatype boxed in this case.
Improvement of low-bit unboxing. Types with up to four unary constructors and any number of nullary constructors may be considered low-unboxed, using the low bit pattens 000, 010, 100, and 110 to distinguish unary constructors and values with the least-significant bit set to distinguish nullary constructors. Again, we assume here that arguments to unary constructors are guaranteed to be boxed (see above).
Some types (e.g., char) are converted into the type int before the boxity analysis, which means that char, for instance, is not considered a low-unboxed type. We should not make this conversion until after boxity analysis.

Another opportunity would be to always unbox singleton record types, when they appear in datatype declarations. However, singleton record types may be a good way for a programmer to secure that a datatype can be represented unboxed using the higher-bits of pointers (see above), thus, we will refrain from this optimisation.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved unboxing #99

Improved unboxing #99

melsman commented Jan 15, 2022 •

edited

Improved unboxing #99

Improved unboxing #99

Comments

melsman commented Jan 15, 2022 • edited

melsman commented Jan 15, 2022 •

edited