Recursion
Often we want to parse arbitrarily nested structures. Arrays, JSON, XML are such example. To do that, we need to be able to pass the parser to itself. Because of a limitation in PHP, we cannot pass a value around before it is created. The solution is to split this in two steps: create a placeholder for a recursive parser, and then define the parser in terms of itself.
Example
We need to parse nested pairs such as [1,[2,[3,4]]]
. The structure repeats itself, every item in the pair can be either a digit or another pair.
We cannot write this:
The above results in "Undefined variable: pair" because we're trying to use $pair
before it's defined. Instead, we need to mark the parser as recursive
in a first step, and then define how the parser should recurse
:
It's possible to nest multiple recursive parsers. Simply initialise them all first using recursive()
and then define them in terms of each other:
Note that when you initialize a parser with recursive()
, it is in fact mutable, and the recurse()
method mutates it. All parsers are immutable, and this is the only exception. After calling recurse()
, the parser is immutable again and behaves just like any other parser.
Using recusion to avoid loops
Let's say we want to parse the character 'a'
at least one time, so that "aaab"
outputs "aaa"
, but "bbb"
fails. Imperatively, you could solve this by running the char('a')
parser in a while loop, and stop on the first failure. We can express it more concisely with recursion though:
- Start by parsing
char('a')
. - Append another
char('a')
, but this second one isoptional()
. - Append another
optional(char('a'))
- Notice the similarity between the first two steps. This suggest an opportunity for recursion.
- Wrap our
char('a')->append(optional(char('a')))
in arecurse()
parser. - Replace the second
char('a')
by the recursive parser.
The end result looks like this:
In fact the code above is how the atLeastOne()
combinator works, so you can simplify that code by writing this: