Consider switching to iterative parser #263

timrwood · 2012-04-06T16:15:16Z

Currently, moment parses by converting strings to arrays based on regexes.

"2001-10-5 4:50 AM" = [2001, 10, 5, 4, 50, "AM"]
"YYYY-MM-DD HH:mm a" = ["YYYY", "MM", "DD", "HH", "mm", "a"]

Then, the parser loops through each item in the array and converts it to the correct argument for parsing an array to a date.

Instead, perhaps the parser should only chuck up the tokens, then loop through them, tearing chunks of the input string apart.

"2001-10-5 4:50 AM"
"YYYY-MM-DD HH:mm a" = ["YYYY", "MM", "DD", "HH", "mm", "a"]
getNextPart("YYYY", "2001-10-5 4:50 AM");
getNextPart("MM", "-10-5 4:50 AM");
getNextPart("DD", "-5 4:50 AM");
getNextPart("HH", " 4:50 AM");
getNextPart("mm", ":50 AM");
getNextPart("a", " AM");

This could solve both the ISO8601 "T" problem/CJK number/month name problem, and the "YYYYMMDD" problem as well.

"20011005"
"YYYYMMDD" = ["YYYY", "MM", "DD"]
getNextPart("YYYY", "20011005");
getNextPart("MM", "1005");
getNextPart("DD", "05");

rockymeza · 2012-04-07T12:45:11Z

While we are discussing a new parser, I feel that I should point out Jison. It's a parser generator for JavaScript that powers CoffeeScript and some other pretty big JavaScript libraries. With Jison you can generate the parser and the generated code has no dependencies.

It might not be the right thing for our use case, and it might not support switching out the tokens because of internationalization, but I feel that it would be wrong not to bring it up at least.

rockymeza · 2012-04-07T12:59:32Z

Now that I've at least brought Jison up, I can actually comment on this parser idea.

I think that it provides a couple of advantages that we cannot get from the regex parsing method:

it can throw very specific errors, telling you exactly where there is a parse error and what it should have looked like. This addresses Moment.js overflows in parsing #235.
it can handle the YYYYMMDD use case, which obviously addresses Moment 1.5 does not parse "YYYYMMDD" formatted dates. #245.
it can handle the CJK short month issue without a special case.
we can also address Consider adding support for all formatting tokens as parsing tokens #259.

A couple questions:

How will it know how to chunk up the tokens?
This does not seem like it could possibly be as fast as the regex parsing. Are we willing to sacrifice some performance for a more robust parser?

timrwood · 2012-04-07T18:52:21Z

I'll look into Jison more in depth, thanks for mentioning it.

re:questions

We can use the regex chunker we currently use for this. This regex used with string.match will output an array of all the matches, eg ["YYYY", "MM", "DD", "HH", "mm", "a"].
We should build it in a separate branch and then run jsperf tests on it to see how much slower it will be. I'm guessing it will be a little slower, but I think the added features will outweigh the speed decrease.

timrwood · 2012-04-08T19:49:01Z

I've started working on this in the feature/parser branch.

https://github.com/timrwood/moment/tree/feature/parser

timrwood · 2012-04-09T20:39:06Z

Well, I've got all the old unit tests up and passing for this, plus some new tests added here.

Now I'll make a jsperf test and see how it compares...

timrwood · 2012-04-09T20:52:15Z

Results are in: http://jsperf.com/moment-iterative-parser

~25% slower on Chrome, ~50% slower on Firefox.

chrome (ops/second)
==================== 1.5.0
===============      new

firefox (ops/second)
==================== 1.5.0
==========           new

It's not that bad, considering other parsers like DateJS are 95% slower than moment. http://jsperf.com/underscore-date-vs-datejs/2

chrome  (ops/second)
==================== moment
=                    DateJS

timrwood · 2012-04-23T18:05:57Z

This has been merged into the develop branch and will go out in the 1.6.0 release

rockymeza mentioned this issue Apr 8, 2012

Release 1.6.0 discussion #268

Closed

ghost assigned timrwood Apr 9, 2012

timrwood closed this as completed Apr 23, 2012

timrwood mentioned this issue Jan 29, 2013

Confusion when validating using String + Format #601

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider switching to iterative parser #263

Consider switching to iterative parser #263

timrwood commented Apr 6, 2012

rockymeza commented Apr 7, 2012

rockymeza commented Apr 7, 2012

timrwood commented Apr 7, 2012

timrwood commented Apr 8, 2012

timrwood commented Apr 9, 2012

timrwood commented Apr 9, 2012

timrwood commented Apr 23, 2012

Consider switching to iterative parser #263

Consider switching to iterative parser #263

Comments

timrwood commented Apr 6, 2012

rockymeza commented Apr 7, 2012

rockymeza commented Apr 7, 2012

timrwood commented Apr 7, 2012

timrwood commented Apr 8, 2012

timrwood commented Apr 9, 2012

timrwood commented Apr 9, 2012

timrwood commented Apr 23, 2012