Skip to content

Virtualized Scala Reference

Adriaan Moors edited this page Oct 5, 2011 · 27 revisions

The embedding of domain specific languages in Scala is based on a simple principle: the domain program looks like it's written in its own language with its own syntax, but really the program is just a bunch of method calls.

For example, in the parser combinator library, you'd write a | b to express the alternative of the production a or b, just like in BNF, but this is just syntactic sugar for the method call a.|(b), where | is a method defined in the Parser class. The | method call does not do any parsing, it simply builds the description of a parser, which can later be applied to input.

We say that the domain program (a | b), which looks like it's written in a domain-specific language (BNF) with its own syntax, is represented by method calls in the host language, and these method calls are "intercepted" by the DSL to construct its internal representation of the domain program, or possibly to execute it directly, depending on the chosen implementation strategy.

More generally, the idea is that the DSL implementer has control over the methods that make up the DSL's syntax and semantics. Defining them so as to construct a representation of the domain program can be thought of as modifying the Scala parser to accept the syntax of the DSL and to generate the DSL's AST. This modification then serves to intercept the normal Scala semantics, and customize them to the DSL's. Nevertheless, no magic is required: we are only "intercepting" method calls, which is to say, defining methods to meet our own needs, as in any other application or library.

This strategy can be applied to a wide range of DSLs, even though it doesn't give the DSL designer complete freedom: the domain programs have to be valid Scala programs. In this sense, the DSL is just a library. Thus, this approach to embedding DSLs can be implemented in any programming language, called the "host" of the embedded DSL. The result will vary strongly, however, depending on the flexibility of the host language's syntax and type system.

The virtualized version of the Scala compiler takes this approach further: it allows the DSL designer to use more than just method calls to build the representation of the domain program. For example, an if may be used in the domain program with the meaning the DSL assigns to if, overriding Scala's interpretation of if, and similarly for most of Scala's other control structures.

This is accomplished by rewriting these expressions (such as if(c) a else b) into method calls (such as __ifThenElse(c, a, b)), so the same principles apply: the DSL may simply override these methods to suit its own purposes. If the method call corresponding to the "virtualised expression" has not been overridden, the compiler gives the expression its traditional meaning (such as an actual if-then-else).

Almost all of these methods can be given precise signatures in Scala's type system, so that they can be defined explicitly (in EmbeddedControls). Since Predef inherits EmbeddedControls, these methods are visible everywhere. You can either shadow them by defining a synonymous method, or override them by inheriting EmbeddedControls.

Applying the former approach in the REPL looks like:

scala> def __ifThenElse[T](cond: => Boolean, thenp: => T, elsep: => T): T = {println("if: "+cond); thenp}
__ifThenElse: [T](cond: => Boolean, thenp: => T, elsep: => T)T

scala> if(false) 1 else 2  // virtualized to `__ifThenElse(false, 1, 2)`
if: false
res0: Int = 1

Here's an overview of the rewrites:

if (c) a else b __ifThenElse(c, a, b)
while(c) b __whileDo(c, b)
do b while(c) __doWhile(b, c)
var x = i val x = __newVar(i)
x = a __assign(x, a)
return a __return(a)
a == b __equal(a, b)
a == (b_1,..., b_n) __equal(a, b_1, ..., b_n)

The corresponding definitions in EmbeddedControls are:

def __ifThenElse[T](cond: => Boolean, thenp: => T, elsep: => T): T
def __whileDo(cond: Boolean, body: Unit): Unit
def __doWhile(body: Unit, cond: Boolean): Unit
def __newVar[T](init: T): T
def __assign[T](lhs: T, rhs: T): Unit
def __return(expr: Any): Nothing
def __equal(expr1: Any, expr2: Any): Boolean

So far these rewrites were pretty straightforward and purely syntactic. Here's a more involved one that is type-directed:

new C { (val x_i: T_i = v_i)* } __new(("x_i", (self_i: Rep[C{ (val x_i: T_i')* }]) => v_i')*) : Rep[C{ (val x_i: T_i')* }]
There is no signature for __new in EmbeddedControls, as its signature would be too unwieldy.
Virtualisation is not performed unless there exists a type constructor Rep, so that C is a subtype of, Row[Rep],

where the marker trait Row is defin

ed in EmbeddedControls:

06d718726c66a0b7083634f09f9caa2fa6eaa10

eb9c0415a30f3b13d2ebc3a10

If there's a type constructor Rep ( of kind * -> *) so that C <: Row[Rep],
the expression ``new C { (val x_i: T_i = v_i)* }`` is turned into
the call ``__new(("x_i", (self_i: Rep[C { (val x_i: T_i')* }]) => v_i')*)``,

which is typed with expected type ``Rep

[C{ (val x_i: T_i')* }]``

This assumes there is a method in scope

Furthermore, for all i,

similar to: def __new[T](args: (String, Rep[T] => Rep[_])*): Rep[T].

  • there must be some T_i' so that

T_i = Rep[T_i'] -- or, if that previous equality is not unifiable, T_i = T_i'

  • v_i' results from retyping ``v
iwith expected typeRep[T_i']``,

after replacing this by a fresh

variable self_i (with type Rep[C{ (val x_i: T_i')* }])

Finally, when a selection e.x_i doe s not type check according to the normal typing rules,
and e has type ``Rep[C{ (val x_i: T _i')* }](for someRepand whereC`` meets the criteria outlined above),

e.x_i is turned into ``e.selectDyna

mic[T_i]("x_i")``

Another type-directed rewrite provides

pimp-my-library functionality without the overhead of creating wrapper objects.

==================== ================ ==================
x.foo(v_1, ..., v_n) __forward(x, "fo o", v_1, ..., v_n)
type TransparentProxy[+T]
def __forward[A,B,C](self: TransparentProxy[A], method: String, x: TransparentProxy[B]*): TransparentProxy[C]