Partial match #645

Seb35 · 2020-04-02T18:43:05Z

Issue type

Bug Report: no
Feature Request: yes
Question: no
Not an issue: no

Prerequisites

Can you reproduce the issue?: yes
Did you search the repository issues?: yes
Did you check the forums?: no
Did you perform a web search (google, yahoo, etc)?: yes

Description

Context: My use case is a have a PEG.js grammar to recognise small expressions in a large text, and I want to pick them out of the whole text without surrounding text. Specifically I know quite well how the expression starts, so I launch the grammar on these “candidate” expressions, but I don’t know where the expressions end: it ends when the grammar cannot recognise the next character.

To fix the ideas, my PEG.js grammar deals with French laws texts and recognises potential links to other articles: expressions like "article 3 du code civil" or "au I du troisième alinéa de l’article 7 bis A de la loi n° 78-17 du 6 janvier 1978" and these expressions could be anywhere in the texts, but they start always more or less with the word "article" or some other word in a whitelist (details).

Polyfill 1: It is possible and it works to write the entry rule as entryPartial = e:myRealEntryRule .* { return e; } but this is highly inefficient for long texts. This is what I implemented (details) with a speed up to capture no more than the current line (details).

Polyfill 2: It is also possible to launch myGrammar.parse( text ) in a try, then catch the location of the error e.location.start.offset and relaunch myGrammar.parse( text.substr( 0, e.location.start.offset ) ) and accept the result if the second launch succeed.

Proposition: Add an option partialMatch in order not to fail after the entry rule is recognised. This means that, on this line, the second condition peg$currPos === input.length is removed when options.partialMatch is active. A variant would be to accept even if the entry rule fails, but the syntax writer could also add a "?" at the end of the entry rule if s/he wants, so I’m not sure it is a use-case for this issue.

We can choose if a partial match is activated: 1. either as an option during grammar generation (so that the JS code is fixed with or without the feature), or 2. either as an option during grammar execution.

This feature request is linked to #607, but the other one is about debugging grammars, here it is about accepting smaller answers than the entire text.

On another PEG parser

Before porting it to PEG.js I wrote my PEG in Python with Parsimonious, and Parsimonious has two different methods:

parse to parse the entire string and return an exception if there are remaining characters, similar to the current behaviour of PEG.js (details)
match to parse the string and fail only if the entry rule is not recognised, similar to the proposed behaviour of partial match (details)

Steps to Reproduce

Example code:

const pegjs = require( "pegjs" );
const grammar = pegjs.generate( 'article = "article" " "+ [0-9-]+' );
match = grammar.parse( "article 7 du code civil", { partialMatch: true } )

Expected behavior:

[ 'article', [ ' ' ], [ '2', '2', '5', '-', '1' ] ]

Actual behavior:

SyntaxError: Expected [0-9,\\-] or end of input but " " found.

Performance comparison

When I try the simple grammar above on a long text like this one (warning: 10.51MB) on a candidate link at index 1618 (which is "article 225-1") and it remains 10,642,912 characters after that, I obtain the following performance results on 20 runs for each category.

Polyfill 1: mean = 2354.872 ms ; stddev = 87.320 ms
Polyfill 2: mean = 0.307 ms ; stddev = 0.125 ms
Proposed feature: mean = 0.102 ms ; stddev = 0.019 ms

Software

PEG.js: 0.10.0 or master (future 0.11.0)

The text was updated successfully, but these errors were encountered:

Seb35 · 2020-04-02T19:13:21Z

The linked PR is what I used for the performance comparison (although based on an older master).

The PR fixes the behaviour at generation-time. Finally I find it would be better to allow changing it at execution-time: if this behaviour is shared by others, the PR could be adapted.

Seb35 mentioned this issue Apr 2, 2020

Partial match feature #646

Open

Seb35 mentioned this issue Apr 3, 2020

NodeJS streams #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial match #645

Partial match #645

Seb35 commented Apr 2, 2020 •

edited

Seb35 commented Apr 2, 2020 •

edited

Partial match #645

Partial match #645

Comments

Seb35 commented Apr 2, 2020 • edited

Issue type

Prerequisites

Description

Steps to Reproduce

Software

Seb35 commented Apr 2, 2020 • edited

Seb35 commented Apr 2, 2020 •

edited

Seb35 commented Apr 2, 2020 •

edited