项目作者: atifaziz

项目描述 :
A Generic Vaughn Pratt's top-down operator precedence parser for .NET Standard
高级语言: C#
项目地址: git://github.com/atifaziz/Gratt.git
创建时间: 2019-10-18T16:31:16Z
项目社区:https://github.com/atifaziz/Gratt

开源协议:Apache License 2.0

下载


Gratt

Gratt is a generic implementation of a Pratt parser in C# for .NET
Standard.

A Pratt parser is the affectionate name for a simple parsing
technique presented by Vaughn Pratt in his 1973 paper “Top
down operator precedence
” (TDOP).

Gratt was inspired by the design of the Java implementation presented by Bob
Nystrom
in his journal entry “Pratt Parsers: Expression Parsing Made
Easy
”. The actual inspiration was the simplification and
separation of interfaces in terms of prefix (nud) and infix (led)
parselets
(rather than the full Java implementation):

  1. interface PrefixParselet {
  2. Expression parse(Parser parser, Token token);
  3. }
  4. interface InfixParselet {
  5. Expression parse(Parser parser, Expression left, Token token);
  6. }

The unit tests for Gratt re-create and test the parser for the toy language
Bantam discussed in Bob’s journal entry.

The unit tests also contain a complete implementation for parsing and
evaluating C# pre-processing expressions used in conditional
directives #if and #elif.

Unlike Bob’s Java implementation, Gratt’s parser is completely generic:

  1. class Parser<TState, TKind, TToken, TPrecedence, TResult>
  2. {
  3. // ...
  4. }

It can therefore work with any model of tokens (TToken), token types
(TKind), precedence (TPrecedence) and result (TResult). It can also hold
any user-defined state (TState) that may be needed during parsing although it
is not required. An overload without any user-state simply assumes a
unit.

Unlike Bob’s Java implementation, Gratt does not define any interfaces to
represent (prefix or infix) parselets. Instead, it relies on them being simple
functions:

  1. // version 2.0+
  2. public static TResult
  3. Parse<TState, TKind, TToken, TPrecedence, TResult>(
  4. TState state,
  5. TPrecedence initialPrecedence, IComparer<TPrecedence> precedenceComparer,
  6. IEqualityComparer<TKind> kindEqualityComparer,
  7. TKind eoi, Func<TToken, Exception> eoiErrorSelector,
  8. Func<TKind, TToken, TState, Func<TToken, Parser<TState, TKind, TToken, TPrecedence, TResult>, TResult>> prefixFunction,
  9. Func<TKind, TToken, TState, (TPrecedence, Func<TToken, TResult, Parser<TState, TKind, TToken, TPrecedence, TResult>, TResult>)?> infixFunction,
  10. IEnumerable<(TKind, TToken)> lexer)

The above might read a little dense so below is a brief explanation of each
parameter in order:

  • a user-defined state that is associated with the parser object during parsing
  • the initial precedence (this is usually 0 if TPrecedence is an int)
  • a precedence comparer
  • an equality comparer that can compare two token kinds
  • a token kind marking end of input (EOI)
  • a function used to project an Exception when the EOI token is not the
    the last token of the tokens sequence
  • a prefix function
  • an infix function
  • a sequence of token kind and token pairs (2-tuple) yielded by a lexer
    implementation

A prefix function receives a token kind, a token and a user-defined state as
input and it must return a parsing function if the token kind has prefix
parsing semantics. If the token kind is invalid as a prefix, then the token and
user-defined state may be used for the purpose of providing details in a thrown
exception (e.g., while TKind may be a simple enum type, TToken will
usually be a rich object containing the position of the token in the source
text among other data). The parsing function returned by the prefix function
then receives the token and the parser as arguments and produces some result
(TResult).

An infix function receives a token kind, a token and a user-defined state as
input as arguments and optionally returns a precedence (left binding power)
and parsing function pair. The parsing function then receives the token, the
left result and the parser as arguments and produces some result (TResult).