27 March 2000 Release 2.22 Notes for New Users of PCCTS Version 1.33MR22
39
determine when your semantic predicate is being hoisted too high. It's easy to find references to a function name
with the editor - but difficult to locate a particular sequence of "LA(1)" and "LA(2)" tests. Predicate hoisting is a
separate issue which is described in Item #138.
In some cases of reported ambiguity it is not necessary to add semantic predicates because no
valid
token sequence
could get to the wrong rule. If the token sequence were invalid it would be detected by the grammar eventually,
although perhaps not where one might wish. In other cases the only necessary action is a reordering of the
ambiguous rules so that a more specific rule is tested first. The error messages still appear, but one can ignore them
or place a trivial semantic predicate (i.e. <<1>>? ) in front of the later rules. This makes
ANTLR
happy because it
thinks you've added a semantic predicate which fixes things.
#170.
Ambiguity,
#pragma
, and
ANTLR
-rl switch (Contributed by John Lilley jlilley@empathy.com)
This note predates -treport (Item #155) and ambiguity aid (Item #54 ff.). However, it is still worth reading.
A significant problem is the exhaustion of
ANTLR
's lookahead-analysis resources, which is often related to the use of
(
...
)*
. Without the
-rl
option,
ANTLR
may consume a large, perhaps unlimited, amount of memory attempting to
create the lookahead token sets for productions involving
(
...
)*
. If
-rl
is specified, then when
ANTLR
reaches the
specified limit, it will exit with the error message "Ran out of resources while attempting to analyze
something
". I
always use the
-rl
option; setting it to 600000 is usually sufficient even for large grammars. With an
-rl
setting
of 60000
ANTLR
can consume about 40Mbytes of memory before aborting.
Note: With recent versions of
PCCTS
one can use -treport to get information on tree node usage for each rule.
An ambiguity problem is hard to get a handle on because there is little indication of the cause of the problem. If you
get this sort of error start looking for:
a. An
(
...
)*
where the "
...
" might be empty.
b. Two adjacent
(
...
)*
, perhaps brought together by related rules, where both
(
...
)
might match the same thing.
c. Something like
{
A
}
(
...
)*
{
A
}
.
d. Recursion can also be a problem if the recursion effectively implements (...)*.
Of course, the pieces of this puzzle might be sitting in different rules that just happen to get slapped together in
some production, so it can be incredibly frustrating to diagnose. Sometimes, there is a real ambiguity underneath
the problem (like a repetition of a possibly empty construct). Other times, the pattern just exceeds
ANTLR's
abilities.
When this happens,
ANTLR
tries to compute an infinite or exponential number of combinations.
The following is the type of thing that drives
ANTLR
crazy, although this is too simple to exhibit any resource
problem (you'll just get an ambiguity warning). But if this pattern were a lot more complex, you'd get resource
problems:
a : (b)+ c
b : { X | Y | Z }
c : Z
There are several approaches I've taken to counter this problem:
a. Use source code control. Any time you introduce a significant change to a large grammar check in a snapshot
first. Then, if you get this error, back out the changes and apply them incrementally.
b. Use
#pragma
approx
. Be very careful with this! It is a sledgehammer for difficult cases that seem to have
no other solution. It is dangerous because it easily masks problems in the grammar. The documentation for
#pragma
approx
is sketchy, but it seems to do the following things in situations where there is ambiguity
and/or exponential analysis.
1. It performs a
linear analysis
: the leading set of tokens will be computed for each lookahead slot
(LA(1), LA(2), etc.) but no "token set tree" will be calculated. This is really only important for
k
>1.
2.
#pragma
approx (
...
)*
or
#pragma
approx (
...
)+
tells
ANTLR
to favor another iteration
of the
(
...
)
over whatever rules may follow.
3.
#pragma
approx {
...
}
tells
ANTLR
to favor the optional stuff over skipping it and matching what
follows.
4.
#pragma
approx (
A
|
B
)
tells
ANTLR
to favor
A
over
B
.