Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved unboxing #99

Open
1 of 4 tasks
melsman opened this issue Jan 15, 2022 · 0 comments
Open
1 of 4 tasks

Improved unboxing #99

melsman opened this issue Jan 15, 2022 · 0 comments
Assignees

Comments

@melsman
Copy link
Owner

melsman commented Jan 15, 2022

The MLKit native backend uses a uniform representation of values, which is important for compiling generic code (e.g., functions) separately from the code that uses it. Still, under these constraints there are many possibilities for securing a compact data representation. For many years, the MLKit has used tagged regions for some types of values (instead of tagging the values themselves) and datatypes with a single unary constructor that takes boxed arguments have been implemented using the lower-bit tags in pointers to discriminate between the constructors. There are more opportunities:

  • Use the 16 most significant bits of pointers on 64-bit architectures for tagging. When we have a datatype $t$ with $u$ unary constructors and $n$ nullary constructors, if $0 ≤ u < 2^{16}$ and if (X) all arguments to the unary constructors are known to be boxed (or not using the high bits), we can represent the type $t$ unboxed, using the high bits for tagging. If $u=0$, we call the type $t$ an enum-unboxed type. We call the type $t$ a low-unboxed type if $u=1$ and the argument to the unary constructor is boxed, and a high-unboxed type if $u > 1$ and if (X) holds. In the worst case, we use a boxed representation of the datatype $t$. When $t$ is a high-unboxed type, we use the 16 most significant bits to discriminate between the unary and nullary constructors (for GC safety, we also set the least significant bit for the nullary constructors). We can use an iterative algorithm do determine the boxity for mutually recursive datatypes, which also include datatypes with a single unary constructor. (For region inference, the spreading algorithm assumes that for mutually recursive datatypes, either all the declared types are unboxed or they are all boxed; we may later relax this assumption.) PR Unboxing of data types using high-bit pointer tagging #149.

  • Use an analysis to separate simultaneously declared datatypes into strongly connected mutually recursive components. This optimisation is of particular importance when a datatype declaration declares both an enumeration type and a recursive datatype that cannot be boxed. The assumptions made by region inference leave the enumeration datatype boxed in this case.

  • Improvement of low-bit unboxing. Types with up to four unary constructors and any number of nullary constructors may be considered low-unboxed, using the low bit pattens 000, 010, 100, and 110 to distinguish unary constructors and values with the least-significant bit set to distinguish nullary constructors. Again, we assume here that arguments to unary constructors are guaranteed to be boxed (see above).

  • Some types (e.g., char) are converted into the type int before the boxity analysis, which means that char, for instance, is not considered a low-unboxed type. We should not make this conversion until after boxity analysis.

Another opportunity would be to always unbox singleton record types, when they appear in datatype declarations. However, singleton record types may be a good way for a programmer to secure that a datatype can be represented unboxed using the higher-bits of pointers (see above), thus, we will refrain from this optimisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant