PEG(1)

PEG(1)                                                                  PEG(1)



NAME
       peg, leg - parser generators

SYNOPSIS
       peg [-hvV -ooutput] [filename ...]
       leg [-hvV -ooutput] [filename ...]

DESCRIPTION
       peg  and  leg  are tools for generating recursive-descent parsers: pro-
       grams that perform pattern matching on text.  They  process  a  Parsing
       Expression  Grammar  (PEG) [Ford 2004] to produce a program that recog-
       nises legal sentences of that  grammar.   peg  processes  PEGs  written
       using the original syntax described by Ford; leg processes PEGs written
       using slightly different syntax and conventions that  are  intended  to
       make  it  an  attractive  replacement for parsers built with lex(1) and
       yacc(1).  Unlike lex and yacc, peg and leg support unlimited backtrack-
       ing, provide ordered choice as a means for disambiguation, and can com-
       bine scanning (lexical analysis) and parsing (syntactic analysis)  into
       a single activity.

       peg  reads  the  specified filenames, or standard input if no filenames
       are given, for a grammar describing the parser to generate.   peg  then
       generates  a  C  source file that defines a function yyparse().  This C
       source file can be included in, or compiled and  then  linked  with,  a
       client  program.   Each  time  the  client  program calls yyparse() the
       parser consumes input text according to  the  parsing  rules,  starting
       from  the first rule in the grammar.  yyparse() returns non-zero if the
       input could be parsed according to the grammar; it returns zero if  the
       input could not be parsed.

       The  prefix 'yy' or 'YY' is prepended to all externally-visible symbols
       in the generated parser.  This is intended to reduce the risk of names-
       pace  pollution in client programs.  (The choice of 'yy' is historical;
       see lex(1) and yacc(1), for example.)

OPTIONS
       peg and leg provide the following options:

       -h     prints a summary of available options and then exits.

       -ooutput
              writes the generated parser to the file output  instead  of  the
              standard output.

       -P     suppresses #line directives in the output.

       -v     writes verbose information to standard error while working.

       -V     writes version information to standard error then exits.

A SIMPLE EXAMPLE
       The  following peg input specifies a grammar with a single rule (called
       'start') that is satisfied when the input contains  the  string  "user-
       name".

           start <- "username"

       (The  quotation  marks  are not part of the matched text; they serve to
       indicate a literal string to be matched.)  In other words, yyparse() in
       the  generated  C  source  will  return non-zero only if the next eight
       characters read from the input spell the word "username".  If the input
       contains  anything  else, yyparse() returns zero and no input will have
       been consumed.  (Subsequent calls to yyparse() will also  return  zero,
       since  the  parser is effectively blocked looking for the string "user-
       name".)  To ensure progress we can add an  alternative  clause  to  the
       'start'  rule that will match any single character if "username" is not
       found.

           start <- "username"
                  / .

       yyparse() now always returns non-zero (except at the very  end  of  the
       input).  To do something useful we can add actions to the rules.  These
       actions are performed after a complete match is  found  (starting  from
       the  first  rule)  and are chosen according to the 'path' taken through
       the grammar to match the input.  (Linguists  would  call  this  path  a
       'phrase marker'.)

           start <- "username"    { printf("%s\n", getlogin()); }
                  / < . >         { putchar(yytext[0]); }

       The  first  line  instructs  the  parser to print the user's login name
       whenever it sees "username" in the input.  If  that  match  fails,  the
       second  line  tells  the parser to echo the next character on the input
       the standard output.  Our parser is now performing useful work: it will
       copy  the  input to the output, replacing all occurrences of "username"
       with the user's account name.

       Note the angle brackets ('<' and '>') that were  added  to  the  second
       alternative.   These  have  no  effect  on the meaning of the rule, but
       serve to delimit the text made available to the following action in the
       variable yytext.

       If  the  above  grammar is placed in the file username.peg, running the
       command

           peg -o username.c username.peg

       will save the corresponding parser in the file username.c.  To create a
       complete  program  this parser could be included by a C program as fol-
       lows.

           #include <stdio.h>      /* printf(), putchar() */
           #include <unistd.h>     /* getlogin() */

           #include "username.c"   /* yyparse() */

           int main()
           {
             while (yyparse())     /* repeat until EOF */
               ;
             return 0;
           }

PEG GRAMMARS
       A grammar consists of a set of named rules.

           name <- pattern

       The pattern contains one or more of the following elements.

       name   The element stands for the entire pattern in the rule  with  the
              given name.

       "characters"
              A  character or string enclosed in double quotes is matched lit-
              erally.  The ANSI C escape sequences are recognised  within  the
              characters.

       'characters'
              A  character or string enclosed in single quotes is matched lit-
              erally, as above.

       [characters]
              A set of characters enclosed in square brackets matches any sin-
              gle character from the set, with escape characters recognised as
              above.  If the set begins with an uparrow (^) then  the  set  is
              negated (the element matches any character not in the set).  Any
              pair of characters separated with  a  dash  (-)  represents  the
              range  of characters from the first to the second, inclusive.  A
              single alphabetic character or underscore is matched by the fol-
              lowing set.

                  [a-zA-Z_]

              Similarly,  the  following matches  any single non-digit charac-
              ter.

                  [^0-9]


       .      A dot matches any character.  Note that the only time this fails
              is at the end of file, where there is no character to match.

       ( pattern )
              Parentheses  are  used for grouping (modifying the precedence of
              the operators described below).

       { action }
              Curly braces surround actions.  The action is arbitrary C source
              code  to  be executed at the end of matching.  Any braces within
              the action must be properly nested.  Any  input  text  that  was
              matched  before  the action and delimited by angle brackets (see
              below) is made available within the action as  the  contents  of
              the character array yytext.  The length of (number of characters
              in) yytext is available in the variable yyleng.  (These variable
              names are historical; see lex(1).)

       <      An opening angle bracket always matches (consuming no input) and
              causes the parser to begin accumulating matched text.  This text
              will be made available to actions in the variable yytext.

       >      A  closing angle bracket always matches (consuming no input) and
              causes the parser to stop accumulating text for yytext.

       The above elements can be made optional and/or repeatable with the fol-
       lowing suffixes:

       element ?
              The  element  is  optional.  If present on the input, it is con-
              sumed and the match succeeds.  If not present on the  input,  no
              text is consumed and the match succeeds anyway.

       element +
              The element is repeatable.  If present on the input, one or more
              occurrences of element are consumed and the match succeeds.   If
              no  occurrences  of  element are present on the input, the match
              fails.

       element *
              The element is optional  and  repeatable.   If  present  on  the
              input,  one  or more occurrences of element are consumed and the
              match succeeds.  If no occurrences of element are present on the
              input, the match succeeds anyway.

       The  above elements and suffixes can be converted into predicates (that
       match arbitrary input text and subsequently  succeed  or  fail  without
       consuming that input) with the following prefixes:

       & element
              The  predicate  succeeds  only if element can be matched.  Input
              text scanned while matching element is  not  consumed  from  the
              input and remains available for subsequent matching.

       ! element
              The predicate succeeds only if element cannot be matched.  Input
              text scanned while matching element is  not  consumed  from  the
              input  and remains available for subsequent matching.  A popular
              idiom is

                  !.

              which matches the end of file, after the last character  of  the
              input has already been consumed.

       A special form of the '&' predicate is provided:

       &{ expression }
              In  this  predicate  the  simple C expression (not statement) is
              evaluated immediately when the parser reaches the predicate.  If
              the  expression  yields non-zero (true) the 'match' succeeds and
              the parser continues with the next element in the  pattern.   If
              the  expression  yields  zero  (false) the 'match' fails and the
              parser backs up to look for an alternative parse of the input.

       Several elements (with or without prefixes and suffixes)  can  be  com-
       bined  into a sequence by writing them one after the other.  The entire
       sequence matches only if each individual  element  within  it  matches,
       from left to right.

       Sequences  can  be separated into disjoint alternatives by the alterna-
       tion operator '/'.

       sequence-1 / sequence-2 / ... / sequence-N
              Each sequence is tried in turn until one  of  them  matches,  at
              which  time  matching for the overall pattern succeeds.  If none
              of the sequences matches then the match of the  overall  pattern
              fails.

       Finally,  the pound sign (#) introduces a comment (discarded) that con-
       tinues until the end of the line.

       To summarise the above, the  parser  tries  to  match  the  input  text
       against  a  pattern  containing  literals,  names  (representing  other
       rules), and various operators (written as prefixes, suffixes,  juxtapo-
       sition  for  sequencing and and infix alternation operator) that modify
       how the elements within the pattern are matched.  Matches are made from
       left  to  right,  'descending' into named sub-rules as they are encoun-
       tered.  If  the  matching  process  fails,  the  parser  'back  tracks'
       ('rewinding'  the input appropriately in the process) to find the near-
       est alternative 'path' through the grammar.  In other words the  parser
       performs  a  depth-first,  left-to-right  search for the first success-
       fully-matching path through the rules.  If found, the actions along the
       successful path are executed (in the order they were encountered).

       Note  that predicates are evaluated immediately during the search for a
       successful match, since they contribute to the success  or  failure  of
       the  search.   Actions,  however, are evaluated only after a successful
       match has been found.

PEG GRAMMAR FOR PEG GRAMMARS
       The grammar for peg grammars is shown below.  This will both illustrate
       and formalise the above description.

           Grammar         <- Spacing Definition+ EndOfFile

           Definition      <- Identifier LEFTARROW Expression
           Expression      <- Sequence ( SLASH Sequence )*
           Sequence        <- Prefix*
           Prefix          <- AND Action
                            / ( AND | NOT )? Suffix
           Suffix          <- Primary ( QUERY / STAR / PLUS )?
           Primary         <- Identifier !LEFTARROW
                            / OPEN Expression CLOSE
                            / Literal
                            / Class
                            / DOT
                            / Action
                            / BEGIN
                            / END

           Identifier      <- < IdentStart IdentCont* > Spacing
           IdentStart      <- [a-zA-Z_]
           IdentCont       <- IdentStart / [0-9]
           Literal         <- ['] < ( !['] Char  )* > ['] Spacing
                            / ["] < ( !["] Char  )* > ["] Spacing
           Class           <- '[' < ( !']' Range )* > ']' Spacing
           Range           <- Char '-' Char / Char
           Char            <- '\\' [abefnrtv'"\[\]\\]
                            / '\\' [0-3][0-7][0-7]
                            / '\\' [0-7][0-7]?
                            / '\\' '-'
                            / !'\\' .
           LEFTARROW       <- '<-' Spacing
           SLASH           <- '/' Spacing
           AND             <- '&' Spacing
           NOT             <- '!' Spacing
           QUERY           <- '?' Spacing
           STAR            <- '*' Spacing
           PLUS            <- '+' Spacing
           OPEN            <- '(' Spacing
           CLOSE           <- ')' Spacing
           DOT             <- '.' Spacing
           Spacing         <- ( Space / Comment )*
           Comment         <- '#' ( !EndOfLine . )* EndOfLine
           Space           <- ' ' / '\t' / EndOfLine
           EndOfLine       <- '\r\n' / '\n' / '\r'
           EndOfFile       <- !.
           Action          <- '{' < [^}]* > '}' Spacing
           BEGIN           <- '<' Spacing
           END             <- '>' Spacing


LEG GRAMMARS
       leg  is a variant of peg that adds some features of lex(1) and yacc(1).
       It differs from peg in the following ways.

       %{ text... %}
              A declaration section can appear anywhere that a rule definition
              is  expected.   The text between the delimiters '%{' and '%}' is
              copied verbatim to the generated C parser code before  the  code
              that implements the parser itself.

       name = pattern
              The 'assignment' operator replaces the left arrow operator '<-'.

       rule-name
              Hyphens  can  appear  as  letters  in  the names of rules.  Each
              hyphen is converted into an underscore in the generated C source
              code.  A single hyphen '-' is a legal rule name.

                  -       = [ \t\n\r]*
                  number  = [0-9]+                 -
                  name    = [a-zA-Z_][a-zA_Z_0-9]* -
                  l-paren = '('                    -
                  r-paren = ')'                    -

              This  example  shows  how ignored whitespace can be obvious when
              reading the grammar and yet unobtrusive when placed liberally at
              the end of every rule associated with a lexical element.

       seq-1 | seq-2
              The alternation operator is vertical bar '|' rather than forward
              slash '/'.  The peg rule

                  name <- sequence-1
                        / sequence-2
                        / sequence-3

              is therefore written

                  name = sequence-1
                       | sequence-2
                       | sequence-3
                       ;

              in leg (with the final semicolon being  optional,  as  described
              next).

       @{ action }
              Actions  prefixed  with  an 'at' symbol will be performed during
              parsing, at the time they are  encountered  while  matching  the
              input  text  with  a  rule.  Because of back-tracking in the PEG
              parsing algorithm, actions prefixed with '@' might be  performed
              multiple  times for the same input text.  (The usual behviour of
              actions is that they are saved up until  matching  is  complete,
              and  then  those  that are part of the final derivation are per-
              formed in left-to-right order.)  The variable yytext  is  avail-
              able within these actions.

       exp ~ { action }
              A  postfix  operator ~{ action } can be placed after any expres-
              sion and will behave like a normal  action  (arbitrary  C  code)
              except  that  it  is invoked only when exp fails.  It binds less
              tightly than any other operator except alternation and  sequenc-
              ing,  and  is  intended to make error handling and recovery code
              easier to write.  Note that yytext and yyleng are not  available
              inside  these  actions, but the pointer variable yy is available
              to give the code access  to  any  user-defined  members  of  the
              parser  state  (see  "CUSTOMISING THE PARSER" below).  Note also
              that exp is always a  single  expression;  to  invoke  an  error
              action  for  any  failure within a sequence, parentheses must be
              used to group the sequence into a single expression.

                  rule = e1 e2 e3 ~{ error("e[12] ok; e3 has failed"); }
                       | ...

                  rule = (e1 e2 e3) ~{ error("one of e[123] has failed"); }
                       | ...

       pattern ;
              A semicolon punctuator can optionally terminate a pattern.

       %% text...
              A double percent '%%' terminates the  rules  (and  declarations)
              section  of the grammar.  All text following '%%' is copied ver-
              batim to the generated C parser code after the parser  implemen-
              tation code.

       $$ = value
              A sub-rule can return a semantic value from an action by assign-
              ing it to the pseudo-variable '$$'.  All  semantic  values  must
              have  the same type (which defaults to 'int').  This type can be
              changed by defining YYSTYPE in a declaration section.

       identifier:name
              The semantic value returned (by  assigning  to  '$$')  from  the
              sub-rule  name  is  associated  with  the  identifier and can be
              referred to in subsequent actions.

       The desk calculator example below illustrates the use of '$$' and ':'.

LEG EXAMPLE: A DESK CALCULATOR
       The extensions in leg described above allow useful parsers and  evalua-
       tors (including declarations, grammar rules, and supporting C functions
       such as 'main') to be kept within a single source file.  To  illustrate
       this we show a simple desk calculator supporting the four common arith-
       metic operators and  named  variables.   The  intermediate  results  of
       arithmetic  evaluation  will  be  accumulated  on  an implicit stack by
       returning them as semantic values from sub-rules.

           %{
           #include <stdio.h>     /* printf() */
           #include <stdlib.h>    /* atoi() */
           int vars[26];
           %}

           Stmt    = - e:Expr EOL                  { printf("%d\n", e); }
                   | ( !EOL . )* EOL               { printf("error\n"); }

           Expr    = i:ID ASSIGN s:Sum             { $$ = vars[i] = s; }
                   | s:Sum                         { $$ = s; }

           Sum     = l:Product
                           ( PLUS  r:Product       { l += r; }
                           | MINUS r:Product       { l -= r; }
                           )*                      { $$ = l; }

           Product = l:Value
                           ( TIMES  r:Value        { l *= r; }
                           | DIVIDE r:Value        { l /= r; }
                           )*                      { $$ = l; }

           Value   = i:NUMBER                      { $$ = atoi(yytext); }
                   | i:ID !ASSIGN                  { $$ = vars[i]; }
                   | OPEN i:Expr CLOSE             { $$ = i; }

           NUMBER  = < [0-9]+ >    -               { $$ = atoi(yytext); }
           ID      = < [a-z]  >    -               { $$ = yytext[0] - 'a'; }
           ASSIGN  = '='           -
           PLUS    = '+'           -
           MINUS   = '-'           -
           TIMES   = '*'           -
           DIVIDE  = '/'           -
           OPEN    = '('           -
           CLOSE   = ')'           -

           -       = [ \t]*
           EOL     = '\n' | '\r\n' | '\r' | ';'

           %%

           int main()
           {
             while (yyparse())
               ;
             return 0;
           }


LEG GRAMMAR FOR LEG GRAMMARS
       The grammar for leg grammars is shown below.  This will both illustrate
       and formalise the above description.

           grammar =       -
                           ( declaration | definition )+
                           trailer? end-of-file

           declaration =   '%{' < ( !'%}' . )* > RPERCENT

           trailer =       '%%' < .* >

           definition =    identifier EQUAL expression SEMICOLON?

           expression =    sequence ( BAR sequence )*

           sequence =      error+

           error =         prefix ( TILDE action )?

           prefix =        AND action
           |               ( AND | NOT )? suffix

           suffix =        primary ( QUERY | STAR | PLUS )?

           primary =       identifier COLON identifier !EQUAL
           |               identifier !EQUAL
           |               OPEN expression CLOSE
           |               literal
           |               class
           |               DOT
           |               action
           |               BEGIN
           |               END

           identifier =    < [-a-zA-Z_][-a-zA-Z_0-9]* > -

           literal =       ['] < ( !['] char )* > ['] -
           |               ["] < ( !["] char )* > ["] -

           class =         '[' < ( !']' range )* > ']' -

           range =         char '-' char | char

           char =          '\\' [abefnrtv'"\[\]\\]
           |               '\\' [0-3][0-7][0-7]
           |               '\\' [0-7][0-7]?
           |               !'\\' .

           action =        '{' < braces* > '}' -

           braces =        '{' braces* '}'
           |               !'}' .

           EQUAL =         '=' -
           COLON =         ':' -
           SEMICOLON =     ';' -
           BAR =           '|' -
           AND =           '&' -
           NOT =           '!' -
           QUERY =         '?' -
           STAR =          '*' -
           PLUS =          '+' -
           OPEN =          '(' -
           CLOSE =         ')' -
           DOT =           '.' -
           BEGIN =         '<' -
           END =           '>' -
           TILDE =         '~' -
           RPERCENT =      '%}' -

           - =             ( space | comment )*
           space =         ' ' | '\t' | end-of-line
           comment =       '#' ( !end-of-line . )* end-of-line
           end-of-line =   '\r\n' | '\n' | '\r'
           end-of-file =   !.


CUSTOMISING THE PARSER
       The  following symbols can be redefined in declaration sections to mod-
       ify the generated parser code.

       YYSTYPE
              The semantic value type.  The pseudo-variable '$$' and the iden-
              tifiers  'bound'  to  rule  results  with the colon operator ':'
              should all be considered as being declared to  have  this  type.
              The default value is 'int'.

       YYPARSE
              The  name  of  the  main entry point to the parser.  The default
              value is 'yyparse'.

       YYPARSEFROM
              The name of an alternative entry  point  to  the  parser.   This
              function expects one argument: the function corresponding to the
              rule from which the  search  for  a  match  should  begin.   The
              default is 'yyparsefrom'.  Note that yyparse() is defined as

                  int yyparse() { return yyparsefrom(yy_foo); }

              where 'foo' is the name of the first rule in the grammar.

       YY_INPUT(buf, result, max_size)
              This  macro  is invoked by the parser to obtain more input text.
              buf points to an area of memory that can hold at  most  max_size
              characters.   The  macro  should copy input text to buf and then
              assign the integer variable result to  indicate  the  number  of
              characters  copied.   If  no  more input is available, the macro
              should assign 0 to result.  By default, the  YY_INPUT  macro  is
              defined as follows.

                  #define YY_INPUT(buf, result, max_size)        \
                  {                                              \
                    int yyc= getchar();                          \
                    result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \
                  }

              Note  that  if YY_CTX_LOCAL is defined (see below) then an addi-
              tional first argument, containing the parser context, is  passed
              to YY_INPUT.

       YY_DEBUG
              If this symbols is defined then additional code will be included
              in the parser that prints vast quantities of arcane  information
              to the standard error while the parser is running.

       YY_BEGIN
              This  macro is invoked to mark the start of input text that will
              be made available in actions as 'yytext'.  This  corresponds  to
              occurrences  of  '<'  in  the grammar.  These are converted into
              predicates that are expected to succeed.  The default definition

                  #define YY_BEGIN (yybegin= yypos, 1)

              therefore  saves  the  current  input  position  and  returns  1
              ('true') as the result of the predicate.

       YY_END This  macros  corresponds to '>' in the grammar.  Again, it is a
              predicate so the default definition  saves  the  input  position
              before 'succeeding'.

                  #define YY_END (yyend= yypos, 1)


       YY_PARSE(T)
              This  macro  declares  the  parser  entry  points  (yyparse  and
              yyparsefrom) to be of type T.  The default definition

                  #define YY_PARSE(T) T

              leaves yyparse() and yyparsefrom() with global  visibility.   If
              they  should  not  be  externally visible in other source files,
              this macro can be redefined to declare them 'static'.

                  #define YY_PARSE(T) static T


       YY_CTX_LOCAL
              If this symbol is defined  during  compilation  of  a  generated
              parser  then  global parser state will be kept in a structure of
              type 'yycontext' which can be  declared  as  a  local  variable.
              This  allows  multiple instances of parsers to coexist and to be
              thread-safe.  The parsing function yyparse() will be declared to
              expect  a  first  argument of type 'yycontext *', an instance of
              the structure holding the global state  for  the  parser.   This
              instance  must  be  allocated  and  initialised  to  zero by the
              client.  A trivial but complete example is as follows.

                  #include <stdio.h>

                  #define YY_CTX_LOCAL

                  #include "the-generated-parser.peg.c"

                  int main()
                  {
                    yycontext ctx;
                    memset(&ctx, 0, sizeof(yycontext));
                    while (yyparse(&ctx));
                    return 0;
                  }

              Note that if this symbol is undefined then the  compiled  parser
              will  statically  allocate  its global state and will be neither
              reentrant nor thread-safe.  Note also that the parser  yycontext
              structure  is initialised automatically the first time yyparse()
              is called; this structure must therefore be properly initialised
              to zero before the first call to yyparse().

       YY_CTX_MEMBERS
              If   YY_CTX_LOCAL   is   defined  (see  above)  then  the  macro
              YY_CTX_MEMBERS can be defined to expand to any additional member
              field  declarations  that  the client would like included in the
              declaration of the 'yycontext' structure type.  These additional
              members  are  otherwise  ignored  by  the generated parser.  The
              instance of 'yycontext'  associated  with  the  currently-active
              parser is available within actions as the pointer variable yy.

       YY_BUFFER_SIZE
              The  initial  size of the text buffer, in bytes.  The default is
              1024 and the buffer size is doubled whenever  required  to  meet
              demand  during  parsing.   An  application that typically parses
              much longer strings could increase  this  to  avoid  unnecessary
              buffer reallocation.

       YY_STACK_SIZE
              The initial size of the variable and action stacks.  The default
              is 128, which is doubled whenever required to meet demand during
              parsing.   Applications  that  have  deep  call stacks with many
              local variables, or that perform many  actions  after  a  single
              successful  match, could increase this to avoid unnecessary buf-
              fer reallocation.

       YY_MALLOC(YY, SIZE)
              The memory allocator for all parser-related storage.  The param-
              eters  are  the  current  yycontext  structure and the number of
              bytes to allocate.  The default definition is: malloc(SIZE)

       YY_REALLOC(YY, PTR, SIZE)
              The memory reallocator for dynamically-grown  storage  (such  as
              text  buffers and variable stacks).  The parameters are the cur-
              rent yycontext structure, the previously-allocated storage,  and
              the  number of bytes to which that storage should be grown.  The
              default definition is: realloc(PTR, SIZE)

       YY_FREE(YY, PTR)
              The memory deallocator.  The parameters are the  current  yycon-
              text structure and the storage to deallocate.  The default defi-
              nition is: free(PTR)

       YYRELEASE
              The name of the function that releases all resources held  by  a
              yycontext structure.  The default value is 'yyrelease'.

       The following variables can be referred to within actions.

       char *yybuf
              This  variable points to the parser's input buffer used to store
              input text that has not yet been matched.

       int yypos
              This is the offset (in  yybuf)  of  the  next  character  to  be
              matched and consumed.

       char *yytext
              The  most recent matched text delimited by '<' and '>' is stored
              in this variable.

       int yyleng
              This variable indicates the number of characters in 'yytext'.

       yycontext *yy
              This variable points to the instance of  'yycontext'  associated
              with the currently-active parser.

       Programs  that  wish  to  release  all  the resources associated with a
       parser can use the following function.

       yyrelease(yycontext*yy)
              Returns all parser-allocated storage associated with yy  to  the
              system.   The  storage  will  be reallocated on the next call to
              yyparse().

       Note that the storage for the yycontext structure itself is never allo-
       cated  or  reclaimed  implicitly.   The application must allocate these
       structures in automatic storage, or use calloc() and free()  to  manage
       them explicitly.  The example in the following section demonstrates one
       approach to resource management.

LEG EXAMPLE: EXTENDING THE PARSER'S CONTEXT
       The yy variable passed to actions contains the state of the parser plus
       any  additional fields defined by YY_CTX_MEMBERS.  Theses fields can be
       used to store application-specific information that is global to a par-
       ticular  call of yyparse().  A trivial but complete leg example follows
       in which the yycontext structure is extended with a count of the number
       of  newline  characters seen in the input so far (the grammar otherwise
       consumes and ignores the entire input).  The caller of  yyparse()  uses
       count to print the number of lines of input that were read.


           %{
           #define YY_CTX_LOCAL 1
           #define YY_CTX_MEMBERS \
             int count;
           %}

           Char    = ('\n' | '\r\n' | '\r')        { yy->count++ }
                   | .

           %%

           #include <stdio.h>
           #include <string.h>

           int main()
           {
               /* create a local parser context in automatic storage */
               yycontext yy;
               /* the context *must* be initialised to zero before first use*/
               memset(&yy, 0, sizeof(yy));

               while (yyparse(&yy))
                   ;
               printf("%d newlines\n", yy.count);

               /* release all resources associated with the context */
               yyrelease(&yy);

               return 0;
           }


DIAGNOSTICS
       peg  and  leg  warn  about  the following conditions while converting a
       grammar into a parser.

       syntax error
              The input grammar was malformed in some way.  The error  message
              will  include  the  text  about to be matched (often backed up a
              huge amount from the actual location of the error) and the  line
              number of the most recently considered character (which is often
              the real location of the problem).

       rule 'foo' used but not defined
              The grammar referred to a rule named 'foo' but no definition for
              it  was  given.   Attempting  to  use  the generated parser will
              likely result in errors from the linker due to undefined symbols
              associated with the missing rule.

       rule 'foo' defined but not used
              The grammar defined a rule named 'foo' and then ignored it.  The
              code associated with the  rule  is  included  in  the  generated
              parser which will in all other respects be healthy.

       possible infinite left recursion in rule 'foo'
              There  exists  at  least one path through the grammar that leads
              from the rule 'foo' back to (a recursive invocation of) the same
              rule without consuming any input.

       Left  recursion, especially that found in standards documents, is often
       'direct' and implies trivial repetition.

           # (6.7.6)
           direct-abstract-declarator =
               LPAREN abstract-declarator RPAREN
           |   direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
           |   direct-abstract-declarator? LBRACKET STAR RBRACKET
           |   direct-abstract-declarator? LPAREN param-type-list? RPAREN

       The recursion can easily be eliminated by converting the parts  of  the
       pattern following the recursion into a repeatable suffix.

           # (6.7.6)
           direct-abstract-declarator =
               direct-abstract-declarator-head?
               direct-abstract-declarator-tail*

           direct-abstract-declarator-head =
               LPAREN abstract-declarator RPAREN

           direct-abstract-declarator-tail =
               LBRACKET assign-expr? RBRACKET
           |   LBRACKET STAR RBRACKET
           |   LPAREN param-type-list? RPAREN


CAVEATS
       A  parser  that  accepts empty input will always succeed.  Consider the
       following example, not atypical of a first attempt to write a PEG-based
       parser:

           Program = Expression*
           Expression = "whatever"
           %%
           int main() {
             while (yyparse())
               puts("success!");
             return 0;
           }

       This  program  loops forever, no matter what (if any) input is provided
       on stdin.  Many fixes are possible, the easiest being  to  insist  that
       the  parser  always  consumes some non-empty input.  Changing the first
       line to

           Program = Expression+

       accomplishes this.  If the parser is expected  to  consume  the  entire
       input,  then explicitly requiring the end-of-file is also highly recom-
       mended:

           Program = Expression+ !.

       This works because the parser will only fail to match  ("!"  predicate)
       any  character  at all ("." expression) when it attempts to read beyond
       the end of the input.

BUGS
       You have to type 'man peg' to read the manual page for leg(1).

       The 'yy' and 'YY' prefixes cannot be changed.

       Left recursion is detected in the input grammar but is not handled cor-
       rectly in the generated parser.

       Diagnostics for errors in the input grammar are obscure and not partic-
       ularly helpful.

       The operators ! and ~ should really be named the other way around.

       Several commonly-used lex(1) features (yywrap(), yyin, etc.)  are  com-
       pletely absent.

       The  generated  parser  does not contain '#line' directives to direct C
       compiler errors back to the grammar description when appropriate.

SEE ALSO
       D. Val Schorre, META II, a syntax-oriented compiler  writing  language,
       19th  ACM  National  Conference, 1964, pp. 41.301--41.311.  Describes a
       self-implementing parser generator for analytic grammars with no  back-
       tracking.

       Alexander  Birman,  The  TMG  Recognition  Schema,  Ph.D. dissertation,
       Princeton, 1970.  A mathematical treatment of the power and  complexity
       of recursive-descent parsing with backtracking.

       Bryan  Ford, Parsing Expression Grammars: A Recognition-Based Syntactic
       Foundation, ACM SIGPLAN Symposium on  Principles  of  Programming  Lan-
       guages,  2004.   Defines  PEGs  and  analyses  them in relation to con-
       text-free and regular grammars.  Introduces the syntax adopted in peg.

       The standard Unix utilities lex(1) and  yacc(1)  which  influenced  the
       syntax and features of leg.

       The source code for peg and leg whose grammar parsers are written using
       themselves.

       The latest version of this software and documentation:

           http://piumarta.com/software/peg


AUTHOR
       peg, leg and this manual page were written by Ian Piumarta  (first-name
       at  last-name dot com) while investigating the viability of regular and
       parsing-expression grammars for efficiently extracting type and  signa-
       ture information from C header files.

       Please  send bug reports and suggestions for improvements to the author
       at the above address.



Version 0.1                     September 2013                          PEG(1)