Skip to content

Remove Leading and trailing automatic space capture #332

@ishax-kos

Description

@ishax-kos

Problem

Using <- containing rules that use < will nullify the effectiveness of the former. For example, imagine you desire grammar like the following:

    File < Statement_list eof
    Statement_list <- Statement Inline_spacing (Separator Inline_spacing Statement)*
    Separator <- ',' / eol+
    Statement < '1' '+' '2'
    Inline_spacing <- :(' ')*

valid input that yields 3 statements might look like

1 + 2, 1 + 2
1
+2

The intention of this grammar is that only one statement can occur per line. Basically statements can be separated by a number of newlines, or one comma, but not both, while the contents of those statements can be spread across multiple lines.
These rules are impossible with this more intuitive setup, because < captures leading and trailing white-space. There is a way to pull this off, but it requires instead setting a custom spacing rule with no newlines, and then manually specifying everywhere a newline can occur in the middle of a statement. For instance, here's an excerpt from my own code.

    br <- :(eol?)

    Expression < Ex5
    Ex5 < Type? br Ex4
    Ex4 < Ex3 (br '+' br Ex3)*
    Ex3 < Ex2 (br '*' br Ex2)*
    Ex2 < ('-' / '/' br)? Ex1
    Ex1 < '(' br Expression br ')' / Ex0
    Ex0 < lit_number / name_value

Proposal

There is an elegant solution that should save on some spacing checks and enable the code I presented at the top to work as intended. Essentially, Spacing is only inserted between characters. Never at the start or end of the rule. There is one exception to this though, which is the entrypoint rule. With this the entrypoint must have non space characters at the first position of the input. The solution for that is to either make an exception where < on the entrypoint will capture leading whitespace, or just require the user to handle it explicitly with "" or using Space directly.
This shouldn't cause breakages for most code, consider:

    A < B C
    B < '1' '2'
    C < '3' '4'

Expands to:

    A <- B Spacing C
    B <- '1' Spacing '2'
    C <- '3' Spacing '4'

The only case where spacing is missed, is where A uses <-, forcing 2 and 3 to be adjacent, or allow only what the user wants.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions