Updated on 2019-12-23
Easily implement efficient backtracking capabilities over any enumeration
Complex parsers often need to support backtracking, which is a way to revisit items you've already encountered. The trick with this is twofold, doing it efficiently, and doing it transparently. The LookAheadEnumerator
Update: Bug fix, more robust, doc comments, and 90% slangified the code (removed switch/nameof and the like) so that Slang can cook it and I can use it in my generated parser code.
Update: LookAheadEnumerator is now used extensively in my Parsley parser generator which can parse C# (an example of parsing a large subset is at the link)
Parsley parser generator which can parse C#
It's often desirable in parser code to use some sort of "streaming" interface for its input like a TextReader class or an class implementing IEnumerator
Normally with an enumerator, all you can do is MoveNext() and sometimes Reset() if you're lucky. There is no way to seek back to a particular previous position, and even if there was, it probably wouldn't work of a true streaming source, like an HTTP response, or console input.
A backtracking parser on the other hand, needs to "bookmark" its current position, before trying several alternatives until it finds one that parses. That means revisiting the same sequence of text several times.
Backtracking parsers are inherently less efficient but far more flexible than non-backtracking parsers. I've made a fair effort to optimize this class to make it as efficient as possible for this purpose.
I've embedded an array backed queue into this class, which it uses to back the the lookahead buffer. The queue starts with 16 elements and grows as needed (almost doubling in size each time to avoid too many reallocations - heap in .NET is cheaper than CPU) depending on how much lookahead is needed. When a LookAheadEnumeratorEnumerator
You use the code like a standard IEnumerator
var text = "fubarfoobaz";
var la = new LookAheadEnumerator<char>(text.GetEnumerator());
la.MoveNext(); // can't look ahead until we're already over the position
// we want to start look at.
foreach (var ch in la.LookAhead)
Console.Write(ch);
Console.WriteLine();
while (la.MoveNext())
{
foreach (var ch in la.LookAhead)
Console.Write(ch);
Console.WriteLine();
}
This would print the following to the console:
fubarfoobaz
ubarfoobaz
barfoobaz
arfoobaz
rfoobaz
foobaz
oobaz
obaz
baz
az
z
As you can see, we're incrementing the primary cursor by one in each iteration, and then we're enumerating over LookAhead from there. Enumerating over LookAhead does not affect the primary cursor*.
Typically in a parser, you'll use it over tokens, like LookAheadEnumerator
That's about all. This is an extremely specialized class, but when you need it, you really need it.