27 March 2000 Release 2.22 Notes for New Users of PCCTS Version 1.33MR22
37
Miscellaneous
#162.
A grammar may contain multiple start rules. They aren't declared.
#163.
Given
rule[A a,B b] > [X x]
the proto is
X rule(ASTBase* ast,int* sig,A a,B b)
The argument "sig" is the status value returned when using parser exception handling.
If automatic generation of ASTs is not selected, exceptions are not in use, or there are no inheritance variables then
the corresponding arguments are dropped from the argument list. Thus with ASTs disabled, no parser exception
support, and neither upward nor downward inheritance variables the prototype of a rule would be:
void rule()
#164.
To remake
ANTLR
after changes to the source code use
make -f makefile1
The first problem with the standard makefile is that generic.h does not appear in the dependency lists. The second
problem is that the rebuild of antlr.c from antlr.g and of scan.c from parser.dlg have been commented out so as to
allow building
ANTLR
on a machine without
ANTLR
the first time when there are problems with zip restoring
modification dates for files.
#165.
ANTLR
reports "... action buffer overflow ..."
There are several approaches:
Usually one can bypass this problem with several consecutive action blocks. Contributed by M.T. Richter
(mtr@ottawa.com).
One can place the code in a separate file and use #include. Contributed by Dave Seidel.
One can add -DZZLEXBUFSIZE=
value
to the command line.
#166.
Exception handling uses status codes and
switch
statements to unwind the stack rule by rule
#167.
For tokens with complex internal structure add #token expressions to match frequent errors
Suppose one wants to match something like a floating point number, character literal, or string literal. These have a
complex internal structure. It is possible to describe them exactly with
DLG
. But is it wise to do so ? Consider:
'\ff
' for
'\xff
' or
"\mThe result is: "
for
"\nThe result is: "
If
DLG
fails to tolerate small errors like the ones above the result could be dozens of error messages as it searches for
the closing quotation mark or apostrophe.
One solution is to create additional #token definitions which recognize common errors and either generates an
appropriate error message or return a special #token code such as "Bad_String_Const". This can be combined with
a special #lexclass which scans (in a very tolerant manner) to the end of the construct and generates no additional
errors. This is the approach used by John D. Mitchell (johnm@jGuru.com) in the recognizer for C character and
string literals in Example #1.
Another approach is to try to scan to the end of the token in the most forgiving way possible and then to validate the
token's syntax in the
DLG
action routine.
#168.
See pccts/testcpp/2/test.g and testcpp/3/test.g for examples of how to integrate non-
DLG
lexers with
PCCTS
The examples were written by Ariel Tamches (tamches@cs.wisc.edu).
#169.
Ambiguity, full LL(
k
), and the linear approximation to LL(
k
)
It took me a while to understand in an intuitive way the difference between full LL(
k
) lookahead given by the
ANTLR
k switch and the linear approximation given by the
ANTLR
ck switch. Most of the time I run
ANTLR
with k 1 and
ck 2. Because I didn't understand the linear approximation I didn't understand the warnings about ambiguity. I
couldn't understand why
ANTLR
would complain about something which I thought was obviously parse-able with
the lookahead available. I would try to make the messages go away totally, which was sometimes very hard. If I
had understood the linear approximation I might have been able to fix them easily or at least have realized that there
was no problem with the grammar, just with the limitations of the linear approximation.
I will restrict the discussion to the case of "k 1" and "ck 2".
Consider the following example: