Skip to content

Dust Inner Workings

Steven edited this page Jun 13, 2013 · 11 revisions

Dust Innerworkings

Let's go through how Dust templates work

Three Easy Steps

From Dust templates to executable JavaScript functions takes just a few easy steps:

  1. Parse
  2. Compile
  3. Render (execute)

Code Example

Parser: template (bunch of text) to AST (meaningful information)

/* the Template __noms.tl__*/

{! this is where i list my noms !}
  <ul id="list-of-noms">
    {#noms}
      <li{@eq key=$idx value=1} class="first"{/eq}>
        Yo Dawg, I heard you like {thing}.
      </li>{~n}
    {/noms}
  </ul>
/* ... becomes an AST... something like */
body:
  comment: 'this is where i list my noms'
  format: \n white spaces
  buffer: '<ul id=list-of-noms>'
  format: \n white spaces
  section #:
    key: 'noms'
    bodies:
      body:
        format: \n white spaces    
        buffer: <li
        Section @:
          key: 'eq'
          param: key=$idx
          param: value=1
          bodies:
            body: 
              buffer: class=first
        buffer: >
        format: \n white spaces    
        buffer: Yo Dawg, I heard you like 
        reference:
           key: 'level'
           filters:
        buffer: white space
        reference: 
           key: 'name'
           filters:
        buffer: </li>
        special: n
    buffer: </ul>
    

Compiler: AST to Filtered AST to JS Function (super powers)

/* dust compiler has optimizers which gives you a smaller AST e.g.
 *   comments and the newline whitespace (format) are gone
 *   the {~n} special character got converted to \n here! 
 * 
 * This is great since it's usually more performant to have work done is compilation
 * instead of at runtime.
 */
 
["body", ["buffer", "<ul id=list-of-noms>"],
    ["#", ["key", "noms"],
        ["context"],
        ["params"],
        ["bodies", ["param", ["literal", "block"],
                ["body", ["buffer", "<li"],
                    ["@", ["key", "eq"],
                        ["context"],
                        ["params", ["param", ["literal", "key"],
                                ["key", "$idx"]
                            ],
                            ["param", ["literal", "value"],
                                ["literal", "1"]
                            ]
                        ],
                        ["bodies", ["param", ["literal", "block"],
                                ["body", ["buffer", " class=first"]]
                            ]]
                    ],
                    ["buffer", ">Yo Dawg, I heard you like "],
                    ["reference", ["key", "level"],
                        ["filters"]
                    ],
                    ["buffer", " "],
                    ["reference", ["key", "name"],
                        ["filters"]
                    ],
                    ["buffer", ".</li>\n"]
                ]
            ]]
    ],
    ["buffer", "</ul> "]
]
/*the compiled template looks like this*/

/* Look how much more useful it is then the AST */
(function(){

/*Every template files registers itself into a cache as their first step*/
dust.register("nom.tl",body_0);

/*The template name is a pointer to a function (body_0 in this example)*/
function body_0(chk,ctx){return chk.write("<ul id=list-of-noms>").section(ctx.get("noms"),ctx,{"block":body_1},null).write("</ul> ");}
 
/* Some parts of the templates can be reused (e.g. in sections, blocks).
 * So the compiler knows to break these into separate functions.
 */
function body_1(chk,ctx){return chk.write("<li").helper("eq",ctx,{"block":body_2},{"key":ctx.get("$idx"),"value":"1"}).write(">Yo Dawg, I heard you like ").reference(ctx.get("level"),ctx,"h").write(" ").reference(ctx.get("name"),ctx,"h").write(".</li>\n");}


/* Notice how everything chains off of chunk, which is nice but
 * it does _all_ the heavy lifting so it can be confusing
 * This is also why you always return a chunk.
 */
function body_2(chk,ctx){return chk.write(" class=first");}return body_0;})();

/* Some items are easy to see in the compiled code. Some are a 
 * bit obfuscated.
 *      Buffers: <li                 chk.write("<li")
 *      Section: @eq                 chk.helper("eq", ctx, ...)
 */ 	

Render: JS function + JSON to output string

/* Execution */

 dust.render = function('nom.tl', context, callback) {
    /* create a chunk */
    /* call dust.load with that chunk */
    /* when all the template functions are done chunk.end() */
 }
 dust.load = function(name, chunk, context) {
   /* 'nom.tl' in cache will return the function body_0*/
   var tmpl = dust.cache[name];
   if (tmpl) {
     /* execute the body_0 with context and chunk*/
     return tmpl(chunk, context);
   }
     
 }
/* output string */
dust.render...

  <ul id="list-of-noms"><li class="first">Yo Dawg, I heard you like thing 1.</li>
  <li>Yo Dawg, I heard you like thing 2.</li>
  </ul>

Parser

What makes var foo = 'bar'; special in *.js file?

What make import and def special in *.py file?

In all languages, there are symbols and words that provides order and special meaning to the language itself. For programming languages like Dust, these keywords and syntaxes are referred to, in whole, as grammar.

The Grammar defines what is meaningful

A long long time ago, @akdubya said in Dust, {#section}{/section} and {@helper}{/helper} will be considered special because it starts with a curly brace followed by # and @ and some key.

The syntax {#section} and other Dust-isms are described using a grammar. There are many grammar languages that compile to JS including Jison, Esprima, and PEGjs. Dust uses PEG to describe itself. The grammar is located at /src/dust.pegjs.

Dust.pegjs is a bunch of Regexes that output an array

If you love regular expressions, you'll love grammars. If you hate regular expressions look at rail road diagrams.

/* excerpt from /src/dust.pegjs */
...

section "section"
  = t:sec_tag_start ws* rd b:body e:bodies n:end_tag? &{if( (!n) || (t[1].text !== n.text) ) { throw new Error("Expected end tag for "+t[1].text+" but it was not found. At line : "+line+", column : " + column)} return true;}
    { e.push(["param", ["literal", "block"], b]); t.push(e); return t }
  / t:sec_tag_start ws* "/" rd
  { t.push(["bodies"]); return t }


sec_tag_start
  = ld t:[#?^<+@%] ws* n:identifier c:context p:params
  { return [t, n, c, p] }
  
...

RailRoad diagrams are awesome

image image

As you see above the grammar, a line of text is broken into significant parts by a bunch of rules. The excerpt shows the rules that determine what is considered a section in Dust.

In the first chunk of Regex (or first set of rail road diagrams), you see an = sign and a / (or a split at the left side of the rail road tracks). This indicates that a section can be one of two types based on these two rules. They both start with a sec_tag_start, but while the first one contains a body the second one does not; the first is a section with a body and the second is a self-closing section.

Be the Parser, my friend

Look at the text below and here is what you (the parser) should see:

  • a comment with the text hello
  • a self-closing section with the key foo
{!hello!}
{#foo/}

That's a comment because you saw {! some text, followed by !}. Simple right.

The next one, {#foo/}, is a bit trickier. It's a self-closing section because you saw a ld -- the left delimiter({), followed by #, follwed by an identfier foo, folowed by /, followed by rd-- the right delimiter (}). If you are following along with the regex or the rail road diagrams you'll notice that we are missing, ws whitespace, identifier , context and params. Well, whitespace was optional. Context and params are actually there, it's just that empty strings are valid contexts and params.

But, at the end of the day we get the AST (which stands for Fancy Syntax Thingie):

   /* {#foo/} becomes...*/
   
   ["body",["#",["key","foo"],["context"],["params"],["bodies"]]]

bodies in a self-closing section?

Whoa there ... what's that ["bodies"]? That wasn't in the definition!?

If you look at the grammar you'll see t.push(["bodies"]); return t:

section "section"
   ...
  \ t:sec_tag_start ws* "/" rd
  { t.push(["bodies"]); return t }
  
sec_tag_start
  = ld t:[#?^<+@%] ws* n:identifier c:context p:params
  { return [t, n, c, p] }

t is the array returned by sec_tag_start which is return [t, n, c, p]. In other words, a self-closing section tag returns an AST of:

[t, n, c, p].push(["bodies"])

[ symbolAfterThe{ , identifier, context, param, ["bodies"] ]

/*...which in our case is */
['#', 'foo', ["context"], ["params"], ["bodies"]]

Now you know how it got there. Why it's there is also important.

  • ['#', 'foo', ["context"], ["params"]] is a section with no bodies
  • whereas ['#', 'foo', ["context"], ["params"], ["bodies"]] is a section with an empty bodies

Thing only work if it is defined in the grammar

Can I have helpers within params? references within references?

Dust Params diagram Dust Reference diagram

/*params and references are limited to what they accept*/
/* no */
{@foo key="{@bar/}"/} // parse error
{baz|{biz}} // parse error

/* yes */
{@foo key=bar/}
{@foo key="{bar}"/}

{baz|j}
{"string intepolate {baz}"|u}

Compiler

You learned how to read (parse) now learn how to compile.

The syntax tree (AST) gives you useful information from a file of text. You now have meaningful information and with knowledge comes power. What kind of powers? Turing powers... ehh... you can compute stuff.

/* Let's parse the thingie below */
{ref}

/* Ohhhh... "{",  followed by something, followed by "}" are *references* */
["reference",["key","ref"],["filters"]]

AST's -- They don't do anything!!

An AST by itself is not very useful. The compiler takes what you are trying to say "put the value of ref here" and translates it into usable, runnable code. The compiler gives references the power to lookup the values from the JSON context and replace itself with that value and also HTML escaping the result.

    /* Enter: dumb Array of stuff */
    ["reference",["key","ref"],["filters"]]
    /* Exit: super awesome*/
    chk.reference(ctx.get("ref"),ctx,"h");

lib/compiler.js from {ref} to chk.reference()

That's smart. What magic does that?

The compiler is just a Javascript file that given the AST array outputs the javascript functions. It lives in lib/compiler.js

Let's look in detail:

    var ast = ["body",["reference",["key","ref"],["filters"]]];
    dust.compile(ast, 'ref_tl');
  
    ...
    (function(){
       dust.register("ref_tl",body_0);
       function body_0(chk,ctx){return chk.reference(ctx.get("ref"),ctx,"h");}return body_0;
    })();

From the above, look for how these get translated.

  • ["body"] -> function body_0(..)
  • ["reference",["key","ref"],["filters"]] -> chk.reference(ctx.get("ref"),ctx,"h")
  • ["key", "ref"] -> ...(ctx.get("ref"),...)
  • ["filters"] -> (..., "h")

The compiler outputs a string of javascript

In Dust, the compiler outputs a string. You can save the output to a file and include it in your page through a script tag...etc.

/* excerpts from lib/compiler.js
...
    function compile(ast, name) {
    ...

      return "(function(){dust.register("
        + (name ? '"' + name + '"' : "null") + ","
        + dust.compileNode(context, ast)
        + ");"
        + compileBlocks(context)
        + compileBodies(context)
        + "return body_0;"
        + "})();";
    };

...
  body: function(context, node) {
    var id = context.index++, name = "body_" + id;
    context.bodies[id] = compileParts(context, node);
    return name;
  }
...
 reference: function(context, node) {
    return ".reference(" + dust.compileNode(context, node[1])
      + ",ctx," + dust.compileNode(context, node[2]) + ")";
  }
...
  key: function(context, node) {
    return "ctx.get('" + node[1] + "')";
  }
...
  filters: function(context, node) {
    var list = [];
    for (var i=1, len=node.length; i<len; i++) {
      var filter = node[i];
      list.push('"' + filter + '"');
    }
    return '"' + context.auto + '"'
      + (list.length ? ",[" + list.join(',') + "]" : '');
  }

There's plenty more defined in the AST and the compiler like partials, params, blocks but you'll have to look for yourself.

Dust Render

I have a function that does a bunch of chk, ctx, body_0 stuff, but none of that's defined. What now?

The last step is actually rendering. The compiler outputted these a fancy function with a bunch of usages of Chunk chk and Context ctx, but where is that defined?

Are we using Dust yet?

The functionality of Dust is defined in lib/dust.js. That file defines several objects

  • dust - the namespace where all the buisness happens. The namespace includes popular functions such as dust.compile and dust.render. Currently, a lot of other stuff is also thrown into this namespace
  • Context a container of all the data available to the template. It contains a Stack--a filtered view of the JSON context, the globals and any available data via blocks
  • Chunk a representation of the piece of template we are working on. It's chainable and it does everything. Some popular methods include chunk.writeso you can write directly to the output, and chunk.map so you can asynchronously render the templates.

Always return a Chunk

For every section, helper, body notice that everything chains off of chunk. Returning a chunk is required by everything. If you don't you break the chain and the next function breaks.

Context rebase Dust makebase

There are a few methods that let you change the context. dust.makebase, context.rebase ... (TBD)

What happens when I render?

You saw earlier calling dust.compile returns a string/function that includes a call to dust.register. Templates registered can be fetched via dust.cache.

When you render a few things happen:

  • create a chunk (chk)
  • put the JSON into the context (ctx)
  • find the template in the cache
  • execute it with chk, ctx
    • this runs down the chk.section().reference().. chain
    • until we run out of stuff in which case it returns a final chk
  • call chk.end() which flushes the output and calls the callback
dust.render = function(name, context, callback) {
  var chunk = new Stub(callback).head;
  dust.load(name, chunk, Context.wrap(context, name)).end();
};

dust.load = function(name, chunk, context) {
  var tmpl = dust.cache[name];
  if (tmpl) {
    return tmpl(chunk, context);
  } else {
    ...
  }
};

The End


####Thanks


Super confusing stuff I'm not quite sure about

What's a Stub? What's a Stream?

As we said earlier, when we call dust.render we get a Chunk chk that gets passed around which eventually returns and calls chunk.end which triggers chunk.root.flush() which triggers the callback.

A Stub appears to be an internal container for the callback and the opening Chunk and puts itself into chunk.root so that it defines what .flush means. It's used by dust.render.

A Streamappears to be an async version of Stub which has no callbacks but it still contains the openingChunk and puts itslef into chunk.root so that it defines what .flush means -- in this case firing a bunch of events. It's used by the async rendering dust.stream.

There's a lot of circular references with these.

/* for dust.render */
stub.head === chunk, chunk.root === stub; 

chunk.end calls chunk.root.flush === stub.flush which uses stub.head aka chunk


/* for dust.stream */
stream.head === chunk, chunk.root === stream

chunk.end calls chunk.root.flush === stream.flush which uses stream.head aka chunk

What's the difference between the JSON, Stack and Context

The JSON data you layman refer to as context is the head of a Stack which is the stack of the Context.

myJSON === stack.head === ctx.stack.head

The Context ctx is the thing that gets passed around includes the stack and globals and blocks.

What's Stack.tail?

The Stack includes the JSON in the head plus a shadow Stack in the tail. (stop now if you still have brains)

The head is the JSON at this point in the template. When we move into a context using the {#section} syntax we put the current JSON context into head and the entire previous Stack into the tail. This shadow Stack is used when we walk up the JSON to find reference values and for parameter values.

{
  foo: {
    bar: {
      baz: 1
    }
  }
}

{#foo}  
  inside foo the head is {bar: {baz: 1}} while the tail is the previous stack
     stack.head === {foo: {bar: {baz: 1}}}
  {#bar}
    inside bar the head is {baz: 1} while the tail is the previous stack 
      stack.head === {bar: {baz: 1}}
      stack.tail is previous stack
         stack.tail.head === {foo: {bar: {baz: 1}}}
  {/bar}
{/foo}

Joy Joy Joy

Params push and why context is lost

{
  outer: {
    value: 1
  },
  foo: {
    bar: 2
  }
}
/* no */
{#foo alias=outer}{alias.value}{/foo}

/* yes */
{#foo alias=outer}{#alias}{value}{/alias}{/foo}

What are Taps?

Per @akdubya's documentation chunk.tap(callback) and chunk.untap() are Convenience methods for applying filters to a stream.. Which is sort of incorrect. It affects everything that uses chunk.write

chunk.tap(callback) puts into a stack arbitrary functions that are run FIFO against the argument of chunk.write. 
chunk.untap() pops the stack

to capitalize the data before you chunk.render

filter: function(chunk, context, bodies) {
    return chunk.tap(function(data) {
      return data.toUpperCase();
    }).render(bodies.block, context).untap();
  }

What is dust.helpers.tap?

Extra confusing

https://github.com/linkedin/dustjs-helpers/blob/master/lib/dust-helpers.js

dust.helpers.tap is a utility helping to resolve dust references in the given chunk. It returns empty string if things are falsy. It internally uses chunk.render and chunk.tap/untap