Skip to content

Non-Greedy Parsers #12

@mattbierner

Description

@mattbierner

Probably not something for core library but defining parsers that are locally non-greedy but consume as much input as required for a successful parsing would be interesting.

One example are the quantifiers trailed by ? in regular expressions:

 /a+?b/.exec('aab');

A naive non greedy strategy is to just consume the minimum number of character until the individual parser succeeds like:

sequence(
     atLeast(1, character('a')),
     character('b'));

But for the input aab, that stops at ab after consuming only one a and then fails.

If we know the parser in advance we can also write:

sequence(
     character('a'),
     manyTil(character('b'), character('a')));

But it is not possible to write a parser that exhibits this behavior without knowing what the next parser, in this case character('b').

parse-re

parse-re does this by attempting completions with the continuation and feeding more data on failure until either the completions succeeds or the parser fails:

atMost = \max, p ->
    (max === 0 ? always(stream.end) :
        \state, m, cok, cerr, eok, eerr -> 
            let r = parse.trampoline <| eok(stream.end, state, m) in
                (r ? eok(stream.end, state, m) :
                    parse.cons(p, atMost(max, p))(state, m, cok, cerr, eok, eerr)));

But this only works because failure always returns null. If failure does something like throw an error, this would crash.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions