Skip to content

Transpiler

Daryl Tan edited this page May 15, 2021 · 7 revisions

The below is a first-person account of how the transpiler got to be, written by @openorclose. It is not meant to be a one-to-one representation of how the transpiler works, but rather the motivation and explanation behind each transformation taken.

A formal code documention rather than a first-person thought process might be in the works.


Foreword

Read this to find out what worked and what failed during the process of development. Of course, stuff that failed might just mean I didn't put more thought into how to get it to work, so don't let it deter you from exploring paths that didn't work for me. Similarly, you can try figuring out how to break stuff with the methods I say do work, and the maintainers of this repository will hopefully fix it.

I hope this page helps anyone who reads my code to understand what I'm trying to accomplish more clearly.


Development of the transpiler

Problems with JavaScript

Bad operators

Most will agree that that the == and != operators are harmful, leading to hard-to-catch bugs. These operators are banned in Source.

Other operators, such as *, like to play nice and force both their operands into numbers if possible. This is not desired by Source, so something must be done to enforce the correct types for operands.

Majority rules

While proper tail calls is a requirement since the ES2015 specifications, most browsers have come to an agreement to not support it. (Since then, major browser support is a requirement for a feature to be included in that year's specs, so such a problem is unlikely to occur again.) Source requires this feature for the concept of iterative and recursive processes, however.

Solutions:

1st try: Interpret

(Supposedly there's a 0th try, a virtual machine, but I don't know too much about that.) The current Source is interpreted on an interpreter written in Javascript. An interpreter that runs on a (mostly) interpreted language. While for most code this is reasonably fast, it slows down to a crawl for more intensive tasks. Some code that runs in a negligble amount of time in Chrome's console takes up to a minute to run in the interpreter. Can we do better?

2nd try: Instrument

Since running it in the console is so fast, why not just eval it? This is the main motive to deviate from the interpreter idea and to switch to code instrumentation instead, to take advantage of the fast JavaScript implementation in current browsers. The rest of this article shall focus on the implementation of this idea.

Just a note, the disallowed constructs such as ==, classes syntax etc are already being caught in the parser used by the current interpreter. It will be reused because it works just fine.

Instrumentation

Liberal instrumentation of code is needed to nudge JavaScript to interpret Source code the right way. These are somewhat simplified versions of what is actually done, such as omitting the ugly passing of line numbers around in code to throw nice error messages for certain detectable errors.

Line numbers of error messages

As you will see in later instrumented code, the line number shown should an error occur would be different from the original one. Fortunately, this is a solved problem, as minified code poses the same issue as well. Source Map s is a way that maps your transformed code back into the original one (and vice versa). Some simple parsing of the error stack is done, and then more simple detection of what the error is and an appropriate message is returned. This does not catch all errors though, so if they can't be parsed properly the original error message gets shown.

Variable storage

see the previous method used here

we expose a evaller function that gets replaced each time to close around the values

(evaller is simply eval the first time a program is run) first run:

let a = 1;
{
 const b = 1; // not stored
}

transpile->

evaller = nextProgram => eval(nextProgram);
let a = 1;
{
 const b = 1; // not stored
}

second run:

a = 2;

transpile->

evaller = nextProgram => eval(nextProgram);
a = 2;

note that on the second run, this program is run in the scope of the previous program using the replaced evaller, so no complicated code is needed to transform top level vars into object accesses like in the older version

Builtins

This is basically saying there are predefined constants, so the same method as above works. Store all builtins in a global constant named BUILTINS. e.g.

const BUILTINS = {
  is_null: testee => testee === null,
  is_string: testee => typeof testee === string,
  //...
};
// Student's code
is_null(null);
// Transpiled code
const is_null = BUILTINS["is_null"];
const is_string = BUILTINS["is_string"];
//...
// Student's code
is_null(null);

Nothing much to say here... If there's a problem with this then the above way of storing globals wouldn't work as well.

Strict types

In JavaScript, most operators can be used with all types, sometimes producing garbage results. In Source, strict types for operators are enforced. All operators will be transpiled into functions. There will be a global constant named OPERATORS, like:

const OPERATORS = {
  "+": (left, right) => {
    if (validOperands) {
      return left + right;
    } else {
      throw new Error();
    }
  },
  callIfFunctionElseError(f, ...args) {
    if (typeof f === 'function') {
      return f(...args);
    } else {
       throw new Error();
    }
  },
  itselfIfBooleanElseError(candidate) {
      if (typeof candidate === 'boolean') {
         return candidate;
      } else {
         throw new Error();
      }
};

Then student's code will be transpiled from

1 + 2;
1 + "string";
1(123); //not a functoin
'string' ? 1: 2; // conditional test must be boolean

into

OPERATORS["+"](1, 2);
OPERATORS["+"](1, "string");
callIfFunctionElseError(1, 123);
itselfIfBooleanElseError('string') ? 1 : 2;

Proper tail calls

As said earlier, Source requires tail-recursive functions to give rise to an iterative process. This does not happen natively in most browsers. As such, some magic needs to happen. Click here if you would like to see the tried and failedattempts

Implementation

We transform return values into objects, and transform the call function to actually be able to execute it. We also the original function into another one to call it properly.

So the above example

const toZero = n => {
  return n === 0 ? {isTail: false, value: 0} : {isTail:true, f:toZero, args:[n - 1]};
};

gets transformed into

const toZero = wrap(n => {
  return n === 0 ? {isTail: false, value: 0} : {isTail:true, f:toZero, args:[n - 1]};
});

where wrap is

const wrap = f => {
  const wrapped = (...args) => call(f, ...args)
  wrapped.transformedFunction = f
  return wrapped
}

and with a small modification to call:

function call(f, args) {
  let result;
  while (true) {
    if (f.transformedFunction !== undefined) { // retrieve the real function that returns { isTail: etc }
      f = f.transformedFunction;
    } // else it's a builtin function like alert, prompt and we call it normally
    result = f(...args);
    if (result.isTail === true) {
      f = result.f;
      args = result.args;
    } else if (result.isTail === false) {
      return result.value;
    } else {
      return result // isTail does not exist on result, it must be from a builtin and so we just return it.
    }
  }
}

This allows them to be called as normal functions, and they can also play nice with builtins.

Small almost negligible issue

Source does not allow objects ({}), so the above works. If objects do get introduced we would have to make the return value an instance of a class and then check for it. But the essence of this solution still works, so that bridge will be crossed if that bridge needs to be crossed.

Infinite loops

while (true) {
  
}

gives rise to an infinite loop. eval this and your browser hangs. This is even more of a problem with a new feature coming up in the Source Academy, where code is automatically run if it is syntactically valid.

let i = 0;
while (i < 10) {
  // i = i + 1; not yet typed
}

is definitely not an uncommon way to start writing a loop. This hanging the browser is not acceptable. Therefore, a simple timeout is used.

let i = 0;
const startTime = Date.now();
while (i < 10) {
  if (Date.now() - startTime > 1000) {
    throw new Error("Infinite recursion");
  }
  // i = i + 1; not yet typed
}

We augment the code with a check, and if the code has been running for >1s we throw an error:

Line x: Potential infinite loop detected.
If you are certain your code is correct, press run again without editing your code.
The time limit will be increased from 1 to 10 seconds.
This page may be unresponsive for up to 10 seconds if you do so.

If the exact same string of code is executed again, the time limit will be increased by a factor of 10 to 10s. This should be enough for most, if not all correct code to run while protecting against infinite loops that crash the browser. 1s is still a long time to be unresponsive when code is being typed though. Maybe a starting time limit of 100ms is better, but more tests need to be done to determine if 100ms will create too many false positives.

Sandboxing

While not entirely needed, if done right this would solve any potential questions of code that manages to run but are not valid Source, such as eval, Function constructor etc.

The first issue is that eval has scope. This means:

const a = 123;
///... other code
return eval(transpiled_code);
// assuming code is 'a;' without having a being declared.

allows access of these variables in its surrounding scope. Ignoring any security risks if they exist (since autograder actually executes these codes there is a chance there may be), this might lead to lots of confusion. Javascript code gets minified, and so there'll be many one-letter variable names throughout the code. Accidentally referring to such variables in Source without having first been declared might lead to the values from the outside leaking, and not throwing an undefined variable error as expected.

Thankfully, as long as eval is used in an indirect scope (such as window.eval) and not those 4 leters proceeded by a left parentheses, it will be executed in the global scope.

Which just leads to the hiding of global variables. For this, we just loop through all the keys in window and filter them to see if they're valid identifiers:

function isValidIdentifier(candidate) {
  try {
    eval(`"use strict";{const ${candidate} = 1;}`)
    return true
  } catch {
    return false
  }
}
const globals = Object.getOwnPropertyNames(window).filter(isValidIdentifier);

And then we wrap the transpiled code with an immediately invoked arrow function expression, with these globals as parameters and setting them to undefined by not passing in arguments:

((eval, window, Number, parseInt, isNaN, ... et) => {
  transpied code...
})();

This also lets Source be able to use all these previously unsuable names such as Number as variable names.

Nice function strings

Small note, since not getting into the implementation details.

Obviously, we can't output the transformed body of the function when turning them into a string. So, we store the original stringified function somewhere else first, and then change the toString method of all the functions to use the original stringified function.