27 March 2000 Release 2.22 Notes for New Users of PCCTS Version 1.33MR22
5
arounds.
Suppose one wants to recognize C style comments using:
#lexclass START
#token Comment_Begin "/\*" <<skip();mode(LC_Comment);more();>>
#token Eof "@"
...
#lexclass LC_Comment
#token Unexpected_Eof "@" <<mode(START);>>
#token Comment_End "\*/" <<skip();mode(START);>>
#token "~[]" <<skip();>>
...
The token code "Unexpected_Eof" will never be seen by the parser. The result is that C style comments which omit
the trailing "*/" can swallow all the input to the end-of-file and not give any error message. My solution to this
problem is to fool
PCCTS
by using the following definition:
#token Unexpected_Eof "@@" <<mode(START);>>
This exploits a characteristic of
DLG
character streams: once they reach end-of-file they must return end-of-file to
every request for another character until explicitly reset.
Another example of this pitfall, is the recognition of unterminated C style strings at the end of a file.
#35. Sometimes the easiest
DLG
solution is to accept one character at a time.
One example is the processing of Fortran style Hollerith constants. See Example #12.
Another example is recognizing radix expressions such as 2#1011 or 16#ffff. Given that the radix can vary between
2 and 36 the easiest way to handle it is to save the radix and then change to another #lexclass where the digits can be
inspected one by one. Another alternative is to accept the entire string and then check all the characters at one time.
#tokclass
#36. #tokclass provides an efficient way to combine reserved words into reserved word sets
#token Read "read"
#token Write "write"
#token Exec "exec"
#token ID "[a-z A-Z] [a-z A-Z 0-9 \@]*"
#tokclass Any {ID Read Write Exec}
#tokclass Verb {Read Write Exec}
command: Verb Any ;
#37. Use
ANTLR
Parser::set_el() to test whether an
ANTLR
TokenType is in a #tokclass or #FirstSetSymbol
To test whether a token "t" is in the #tokclass "Verb":
if (set_el(t->getType(),Verb_set)) {...}
There are several variations of this routine in the
ANTLR
Parser class.
#tokdef
#38. A #tokdef must appear near the start of the grammar file (only #first and #header may precede it)
#lexclass
#39. Inline regular expressions are put in the most recently defined lexical class
If the most recently defined lexical class is not START you may be surprised:
#lexclass START
...
#lexclass LC_Comment
...
inline_example: symbol "=" expression ;
This will place "=" in the #lexclass LC_Comment (where it will never be matched) rather than the START #lexclass