Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mustache parser #11

Closed
wants to merge 17 commits into from
Closed

Mustache parser #11

wants to merge 17 commits into from

Conversation

Odie
Copy link
Contributor

@Odie Odie commented Mar 31, 2019

This is just to get things started.

  • extract tags in text
  • deal with delimiter changes
  • deal with blocks and nested blocks
  • translate into data template for execution
    • skip section when referenced data is empty (false values or empty lists)
    • loop over nested data lists (non-empty lists)
    • call user supplied functions (lambdas)
    • single data item (non-false values)
    • inverted sections
    • partials
  • template syntax error checking
  • cljs compatibility

Hopefully, we'll have something to look at before too long.
Closes: #9

@Odie
Copy link
Contributor Author

Odie commented Apr 2, 2019

We're now able to render nested items:
As an example, evaluating the following snippet:

   (let [data-template (parse
"# Movies list
{{#movies}}
## {{name}}
### Actors
{{#cast}}
* {{name}}
{{/cast}}
{{/movies}}
")]
     (->> (into [] (hopen.renderer.xf/renderer data-template)
                [[{:movies [{:name "Tropic Thunder"
                             :cast [{:name "Ben Stiller"}
                                    {:name "RDJ"}
                                    {:name "Jack Black"}]}
                            {:name "The secret life of Walter Mitty"
                             :cast [{:name "Ben Stiller"}
                                    {:name "Kristen Wiig"}]}]}]])
          (apply str)
          (println)))

Will yield:

# Movies list

## Tropic Thunder
### Actors

* Ben Stiller

* RDJ

* Jack Black


## The secret life of Walter Mitty
### Actors

* Ben Stiller

* Kristen Wiig

@Odie
Copy link
Contributor Author

Odie commented Apr 2, 2019

@Odie I don't understand what ctx-lookup is doing and how it used in the parser. Could you explain me, please?

From the mustache man page:

The most basic tag type is the variable. A {{name}} tag in a basic template will try to find the name key in the current context. If there is no name key, the parent contexts will be checked recursively. If the top context is reached and the name key is still not found, nothing will be rendered.

Example

I'll try to explain this in detail. There may be a bit of text, but it will hopefully avoid any miscommunication.

Let's take this piece of data:

{:movies [{:movie-name "The Great Gatsby"
           :cast [{:name "Leo"}]]}

For a template like this:

{{#movies}}                                                  [1]
{{#cast}}                                                    [2]
* {{name}} was in {{movie-name}}                             [3]
{{/cast}}
{{/movies}}

The expected output would be:

* Leo was in The Great Gasby

Note that at [3], we're accessing cast.name and movie.movie-name (from a parent context). To accomplish this, the mustache parser tries to use 'hopen/ctx as a stack. As each section/block/loop is encountered, the item/data we're iterating on is is pushed into the stack a child context. When we need to look up a field, we don't just look up the field in the last child context, we look through all open contexts recorded on the stack, until we either locate the field somewhere along the chain, or come up with nil.

Applied to this example...
The stack starts life like [root], where root is the data given in the example.

Then at [1], we do a ctx-lookup on :movies and finds a collection we can iterate over in root. We iterate over that collection (using b/for). Then, we push each item data, as we visit them, into the context stack (using b/let & conj). As we iterate through each item in :movies, we push each onto the ctx stack. So now the stack looks like

[root    {:movie-name "The Great Gatsby", :cast ...}] 
   |       |
   v       v
 ctx-a   ctx-b

At [2], we do another ctx-lookup for :cast, and find the field on ctx-b. We do the same thing we did before. The stack now looks like:

[root    {:movie-name "The Great Gatsby", :cast ...} {:name "Leo"} ] 
   |       |                                              |
   v       v                                              v
 ctx-a   ctx-b                                          ctx-c

At [3], when we look up :name, ctx-lookup can search backwards up the stack, finding the :name field on ctx-c. When we lookup :movie-name, ctx-lookup will then find the field on ctx-b. In either case, the requested field can be retrieved without issue.

That was a bit long winded, but I hope it was clear. =)

@green-coder
Copy link
Collaborator

green-coder commented Apr 2, 2019

@Odie Thanks for the very long explanation. I did not know that Mustache was behaving like that, and I am totally surprised and scared 😨 to see it as it seems to me that it is a very poor way to reference data, causing bugs when more fields are added to maps which are in the most inner contexts. That's the opposite of the Clojure's culture which makes sure that there is zero misunderstanding by giving namespaces to variables.

It makes me wonder if we should import this behavior in hopen or deviate from Mustache's spec and fix it.

Technically, I am thinking about another way to implement Mustache's variable resolution without a stack, that would be to merge the context's maps together as you enter the inner blocks. You would then only need 1 lookup to resolve a variable using a normal (hopen/ctx (keyword var-name).

(defn- ctx-lookup
"Tries to look up a field name in every context starting from most nested context first."
[env field-name]
(some #(get % (keyword field-name)) (reverse (get-in env [:bindings 'hopen/ctx] []))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function may only be used by the Mustache syntax. I would prefer it to live in one of the mustache syntax's namespace and not been added to the default-env of this namepace.

It means that you would need to define a default-env in the mustache syntax's namespace as well, which you could express as an update of the one which is here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: the code being refactored, I mean based on hopen.renderer.env.

@Odie
Copy link
Contributor Author

Odie commented Apr 2, 2019

@Odie Thanks for the very long explanation. I did not know that Mustache was behaving like that, and I am totally surprised and scared 😨 to see it as it seems to me that it is a very poor way to reference data, causing bugs when more fields are added to maps which are in the most inner contexts. That's the opposite of the Clojure's culture which makes sure that there is zero misunderstanding by giving namespaces to variables.

It makes me wonder if we should import this behavior in hopen or deviate from Mustache's spec and fix it.

I think mustache “has” to have this sort of variable resolution because it’s syntax gives no way today which scope/ctx you want a variable from.

In clojure (or any reasonable programming language), the bindings have unique names so it’s perfectly clear where you’re getting data from.

(for [movie (‘hopen/ctx :movies)]
   (for [actor (:cast movie)]
      (str (:name actor) “ was in “ (:name movie))))

But... Mustache is “logic-less” and doesn’t even provide a way to define a binding. I’m pretty sure we don’t want to import or duplicate this kind of behavior elsewhere. In my long winded exmaple, the shape of my data can’t even have movie.name and cast.name at the same time. Otherwise, movie.name becomes inaccessible. It's not great when the template language starts to dictate what shape your data can be in.

That said, I'm not sure if deviating from the "mustache" spec is useful. Because, then, it wouldn't be mustache, and you lose any benefits from maybe being able to use existing mustache templates or whatnot.

Technically, I am thinking about another way to implement Mustache's variable resolution without a stack, that would be to merge the context's maps together as you enter the inner blocks. You would then only need 1 lookup to resolve a variable using a normal (hopen/ctx (keyword var-name).

I don't have particular preferences here. If you want to keep ctx to a single map instead of a stack of maps, that's okay too. Merging maps is just as easy as pushing something onto a stack.

@Odie Odie closed this Apr 2, 2019
@Odie Odie reopened this Apr 2, 2019
@green-coder
Copy link
Collaborator

green-coder commented Apr 2, 2019

I am proposing this deviation from the official spec:

We could extend the syntax a bit to have a way to refer which context should be used to resolve a variable. The idea is a little like using ../ in a file system's path. We could have ~ to represent 1 context above, and the user could use multiple ~ in front of the variable's name.

Example:

{{#movies}}
{{#cast}}
* {{name}} was in {{~movie-name}}
{{/cast}}
{{/movies}}

I let you decide if you want to use it or not.

@Odie
Copy link
Contributor Author

Odie commented Apr 2, 2019

We could extend the syntax a bit to have a way to refer which context should be used to resolve a variable. The idea is a little like using ../ in a file system's path. We could have ~ to represent 1 context above, and the user could use multiple ~ in front of the variable's name.

That’s certainly possible, but then we wouldn’t be able to merge the ctx maps, which would remove the contex level info. Anyway, I’ll keep this in mind.

Let’s revisit this when I get through with some of the other stuff.

@green-coder
Copy link
Collaborator

@Odie I pushed 76a56a9 on the master branch which solves the Cons problem.

@green-coder
Copy link
Collaborator

We could extend the syntax a bit to have a way to refer which context should be used to resolve a variable. The idea is a little like using ../ in a file system's path. We could have ~ to represent 1 context above, and the user could use multiple ~ in front of the variable's name.

That’s certainly possible, but then we wouldn’t be able to merge the ctx maps, which would remove the contex level info. Anyway, I’ll keep this in mind.

We would not need to merge anything anymore.

The data-template would become something like:

'[(b/for [mustache/symb_0 (hopen/ctx :movies)]
    [(b/for [mustache/symb_1 (mustache/symb_0 :cast)]
       [(mustache/symb_1 :name) " was in " (mustache/symb_0 :movie-name)])])]

@Odie
Copy link
Contributor Author

Odie commented Apr 2, 2019

'[(b/for [mustache/symb_0 (hopen/ctx :movies)]
    [(b/for [mustache/symb_1 (mustache/symb_0 :cast)]
       [(mustache/symb_1 :name) " was in " (mustache/symb_0 :movie-name)])])]

Hmm... what’s the goal of changing what the parser outputs?

If we aim for this kind of parser output, the parser will have to become “smarter” (aka more complicated) to keep track of embed/nested levels and assign them a unique symbol on bind and on lookup. But, we still have distinct, multiple open contexts. They are just now have individual bindings instead of being in a stack, making recursive lookups harder. Relative context fetches isn’t easier either. With a stack, it’s possible to grab (nth stack (- (count stack) n)) to access the nth parent. With named bindings, again, the exact nested level needs to be known.

The kind of transformation we’re performing on the tree is really straightforward, and free of requiring any information regarding where the node is in the overall program. It should be more than capable of implementing what the spec asks for.

@green-coder
Copy link
Collaborator

'[(b/for [mustache/symb_0 (hopen/ctx :movies)]
    [(b/for [mustache/symb_1 (mustache/symb_0 :cast)]
       [(mustache/symb_1 :name) " was in " (mustache/symb_0 :movie-name)])])]

Hmm... what’s the goal of changing what the parser outputs?

  • To remove Mustache's ambiguity on the referencing of the data.
  • To move the context resolution from the rendering phase back to the parsing phase.

If we aim for this kind of parser output, the parser will have to become “smarter” (aka more complicated) to keep track of embed/nested levels and assign them a unique symbol on bind and on lookup. But, we still have distinct, multiple open contexts. They are just now have individual bindings instead of being in a stack, making recursive lookups harder. Relative context fetches isn’t easier either.

I think that it is not as hard as you imagine. The only thing the parser needs to track is the current level of nesting of the context, use it as a base to count backward when the parser meet the ~ prefix, then output some data-template without doing any lookups.

The kind of transformation we’re performing on the tree is really straightforward, and free of requiring any information regarding where the node is in the overall program. It should be more than capable of implementing what the spec asks for.

@green-coder
Copy link
Collaborator

FYI, I found a discussion on this topic janl/mustache.js#399

@Odie
Copy link
Contributor Author

Odie commented Apr 3, 2019

I think that it is not as hard as you imagine. The only thing the parser needs to track is the current level of nesting of the context, use it as a base to count backward when the parser meet the ~ prefix, then output some data-template without doing any lookups.

The thing I take issue with isn’t about “how hard” it might be. Certainly, we can count those nested levels, and we can generate those symbols, and we can calculate the nested level difference to fetch the right context. But what is already in place is perfectly able to address parent contexts unambiguously if that’s what we want. It can be made to deal with recursive lookups and it can also be made to require explicit references to parent context. A stack is perfect for this.

So, to me, adding additional housekeeping to generate this specific shape of hopen data, introduces additional complexity but does not give us any additional features in return. Thus, my question still is, why are we after this specific output shape?

@green-coder
Copy link
Collaborator

green-coder commented Apr 3, 2019

[...] But what is already in place is perfectly able to address parent contexts unambiguously if that’s what we want.

Your implementation is not ambiguous. My concern is the Mustache's specs itself, I think that the spec is ambiguous and that's a big issue to me. I regret having referred to Mustache when I created the issue, have I had done more research I would have chosen the spec of handlebar instead.

So, to me, adding additional housekeeping to generate this specific shape of hopen data, introduces additional complexity but does not give us any additional features in return. Thus, my question still is, why are we after this specific output shape?

It was to deviate from the spec of Mustache and fix its ambiguous behavior.

@Odie
Copy link
Contributor Author

Odie commented Apr 3, 2019

Okay. It's fine if you want to extend or deviate from the mustache spec.

What I've been trying to say is, using a stack to track contexts will work perfectly well for that.

Given a stack of open contexts, we can unambiguously locate parent contexts:

[ ctx-a ctx-b ctx-c ctx-d]
                 |     |
                 |     v
                 |     last element is current context
                 ->  (length - 1) element is parent context ("~" or "../" or whatnot)

We don't perform any additional housekeeping (nested level count, var symbol generation, etc). With a stack, we can already unambiguously access the exact context the user might ask for. Very little needs to change in the parser to support this.

Odie added 4 commits April 4, 2019 12:32
- Partials and Lambdas have been left unimplemented
- Moves mustache specific code out of xf.cljc
- Adds to test to verify feature functionality
green-coder and others added 6 commits April 4, 2019 15:26
Tag contents are now parsed and tagged inside `assoc-tag-type` only. It also
takes care of dealing with unexpected whitespace properly. The rest of the code
no longer assumes the exact format of the tag string, but relies on the cleaned
info to be on the tag itself.
The following was done to get this to happen:
- Move backtick into cljc
- Reimplement `re-seq-pos` for cljs
- Use `ex-info` when throwing errors
- Moves `render-` into test file
- Adds parse handling of escape tags
- Adds humane-test-output for readable test output
@Odie Odie changed the title Mustache parser [not ready for merge yet] Mustache parser Apr 10, 2019
Copy link
Collaborator

@green-coder green-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globally, I found the source code very hard to read. The functions are too big also, it makes them harder to reason with and test.

I recommend spending a lot of time refactoring it.

true
(reduced false))
args))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the and and or is a good idea. However, doing it in a unrelated branch is not a good idea.
If you need something to be added in the renderer, I would prefer that you let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by unrelated branch. and and or were added in the course in implementing the parser. They are referenced here, here, and here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the wrote wording. They are related, but adding new functions to the default environment is not the subject of your PR.

PR are either accepted or rejected (and maybe updated and reviewed again). There is no in-between like taking some of the parts of the commits. For this reason, I advise opening a separate PR for modifying stuffs which are not the main purpose of the main PR. It would be easier to discuss, possibly amend and been merged.

The idea is similar to the single responsibility objects or function: it is better not to have more than 1 reason for the object / function to be obsolete and be refactored - for the PR it means to only do 1 subject at a time. The more subjects in a PR, the higher the probability of having 1 subject on which people would disagree, preventing the merge for any of the code addressing any of the subjects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this parser, having and and or is a required part of making it actually work so it is well within the "subject" of this branch.

This parser is the first concrete parser to actually make use of and fully exercise the hopen renderer. It's only natural to find that there are many missing pieces in the default environment that are useful or required to support the parsers.

Following what you're describing, it would take time and effort a to create a new PR, fully explain why it's needed, wait for code review, and approval, just so it can eventually be merged back into this branch so the work can continue. Why would I do that? How much additional time do you want me to put into stuff like this?

I'm the only one working on this particular parser, in my free-time, for the (hopefully) fun of being able to work with some people in clojure locally. I don't understand your insistence that I jump through multiple hoops to satisfy some kind of strange PR structure that isn't explained anywhere and is outside of common github practices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this parser, having and and or is a required part of making it actually work so it is well within the "subject" of this branch.

I understand that.

This parser is the first concrete parser to actually make use of and fully exercise the hopen renderer. It's only natural to find that there are many missing pieces in the default environment that are useful or required to support the parsers.

I agree that.

Following what you're describing, it would take time and effort a to create a new PR, fully explain why it's needed, wait for code review, and approval, just so it can eventually be merged back into this branch so the work can continue. Why would I do that? How much additional time do you want me to put into stuff like this?

That's why I was suggesting that you let me know, so I can add it faster.

When I add things in the renderer's default environment, I am considering all the parsers at the same time, and need to carefully choose what should be in the defaults and what should not. That's why I prefer it to be done independently from the parsers, in a separate branch.

I'm the only one working on this particular parser, in my free-time, for the (hopefully) fun of being able to work with some people in clojure locally. I don't understand your insistence that I jump through multiple hoops to satisfy some kind of strange PR structure that isn't explained anywhere and is outside of common github practices.

If it not fun anymore, I suggest you to stop working on it. I know that my standard for quality is way higher than the average and it can be very frustrating for you to have some code that starts to work but that don't get merged in the project yet.

I would have been more willing to merge this PR if it was contained in its own namespace and not affecting the rest of the code. At the moment, this PR:

  • Added the source code of backtick in its root.
  • Added a test framework that modifies the test output's format.
  • Added some code to the renderer.
  • Added some variable to the default environment.

A PR being merged or not merge (with nothing in between), I don't want to merge it as it is now.

.. and I am not sure if addressing all those listed issues would solve the problem. There are other problems with the source code, I think it should be simpler.

In the Gitter chat room, I was proposing you to use the "Segment" type from the instaparse library which let you have substrings which keep references to the original string with an index and size information. Using it would make a big chunk of your source code simpler.

I did not fully understood how you use the line numbers in your parser, but if it is just for error reporting, you may as well calculate them once you need to output an error, it may result in a more decoupled source code.

My last feedback would be more an advice, it's to give instaparse a try. It has the potential to make a big part of your code simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s been a strange short journey for sure. The 4 points that you cited as the reason for not accepting the PR are all quite superficial and extremely minor problems.

It really is no big deal if you don’t accept the PR. I wouldn’t mind in the least even if we eventually decide to throw out this implementation because things could be done much more easily and cleanly in instaparse. Its the learning, the discussion, and the journey that makes it fun.

What I do mind, is that although the original invitation was to work on something “together” so the community has something to discuss over meetups, you’ve now assigned yourself the boss. Your expect your opinions to be unquestioningly followed for some reason.

Since you want to make this whole process intentionally difficult and this has definitely ceased being “fun”, I’m closing the PR.

@@ -26,4 +28,5 @@
:optimizations :none
:pretty-print false}}]}
:doo {:alias {:browsers [:chrome :firefox]}
:paths {:karma "./node_modules/karma/bin/karma"}})
:paths {:karma "./node_modules/karma/bin/karma"}}
:profiles {:dev {:dependencies [[pjstadig/humane-test-output "0.8.2"]]}})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in your local ~/.lein/profiles.clj instead of in the project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked like I had to activate it in runner.cljs like this. I don't think it would compile if I leave out the dependency.

This appears to be the way to add dependencies that should only be included for dev or test time. link


I think having a default test reporter for both clj and cljs would be useful to anyone who might want to participate in working on this project.

The main problem is that the default test reporter show something like this for a failing case:
Screen Shot 2019-04-11 at 10 01 29 AM

It's quite difficult to figure out what's actually different here. I've been using eftest with clj so far (added via profile.clj). It prints the error this way.

Screen Shot 2019-04-11 at 10 03 36 AM

That's much more readable! The last diff section also makes it clear what is actually different in these two pieces of data. Kaocha actually produces even better looking output (and it's faster to run for some magical reason).

Both eftest and Kaocha seem to be for clj only though. For cljs, "humane-test-output" seems to be the only one that's easy to add. It's prints errors like this:

Screen Shot 2019-04-11 at 10 09 59 AM

It's not colorized, but still makes the test output much easier to read and understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme file of contains information on how to avoid changing the source code.
https://github.com/pjstadig/humane-test-output#clojure

I am using IDEA and I don't need that library. What I need is the default output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already tried the instructions, but they do not work for cljs.

'escape-html util/escape-html

'empty? empty?
'not not}})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark, I would prefer that you let me know that you need some functionalities instead of added them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I don't quite understand how you propose to carry on with development in this project.

In the course of developing the parser, I noticed that a function is needed, so I added it in this isolated branch. No one else is impacted by the change.

These are functions that many parser are likely to need, so it makes sense to add it to the renderer itself, making it available to all parsers.

The nature of a pull request is to propose and request code changes. So this is a valid channel for making requests known. An implementation is also already provided.

By adding the functions that I need, the moment they are needed, I could continue to carry on with development without having to wait for other people. If any changes were found to be objectionable, they could be rolled back or changed. The benefit here is that the rest of the work that depends on that change will have been fleshed out and working, instead of being stopped.

To me, all of the above sounds reasonable. What's the objection here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing wrong about adding some code in our your branch to make it work.

The problem arises when you ask to merge it. If you want your code to be easy to be discussed, maybe amended and merged, a separated PR would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already discussing the code here? If you want something amended, you can just leave a comment as usual?

src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
src/hopen/syntax/mustache.cljc Outdated Show resolved Hide resolved
@Odie Odie closed this Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants