27 March 2000 Release 2.22 Notes for New Users of PCCTS Version 1.33MR22
41
Note that, if the brute-force
#pragma
approx
had been used, then some relational expressions would never be
parsed. The combination of
#pragma
approx
and the predicate has (hopefully) solved the problem correctly.
#171. What is the difference between "
(...)? <<...>>?
x
" and "
(...)? => <<...>>?
x
" ?
The first expression is a syntactic predicate followed by a semantic predicate. The syntactic predicate can perform
arbitrary lookahead and backtracking before committing to the rule. However it won't encounter the semantic
predicate until already committed to the rule - this makes the semantic predicate merely a validation predicate.
Not a very useful semantic predicate.
The second expression is a "guarded semantic predicate" with a convenient notation for specifying the look-ahead
context. The context expression is used to generate an "if" condition similar to that used to predict which rule to
invoke. It isn't any more powerful than the grammar analysis implied by the
values you've chosen for the
ANTLR
switches ­k and ­ck. It doesn't have any of the machinery of syntactic predicates and does
not
allow arbitrarily
large lookahead. If the syntax predicate is
true
the semantic predicate is evaluated ­ if
true
the parse of the
alternative continues, and if
false
the parse of that alternative is aborted. If the syntax predicate is
false
then the
semantic predicate is ignored and the parse continues. A common misconception is that the parse of the alternative
is rejected when the syntax predicate is false.
#172. Memory leaks and lost resources
Syntactic predicates use setjmp/longjmp. They cause leaks even with reference counted tokens. (Item #123).
Delete temporary attributes on rule failure and exceptions (Item #193).
Delete temporary ASTs on rule failure and exceptions (Item #95).
A rule that constructs an AST returns an AST even when its caller uses the "!" operator (Item #88).
(C++ mode) A rule which applies "!" to a terminal loses the token (Item #89) unless the
ANTLR
reference counting
option is enabled.
(C mode) Define a zzd_ast() routine if you define a zzcr_ast() or zzmk_ast() (Item #200).
#173. Some ambiguities can be fixed by introduction of new #token numbers
For instance in C++ with a suitable definition of the class "C" one can write:
C a,b,c /* a1 */
a.func1(b); /* a2 */
a.func2()=c; /* a3 */
a = b; /* a4 */
a.operator =(b); /* a5 */
Statement a5 happens to place an "=" (or any of the usual C++ operators) in a token position where it can cause a lot
of ambiguity in the lookahead set. One can solve this particular problem by creating a special #lexclass for things
which follow "operator" with an entirely different token number for such operator strings - thereby avoiding the
whole problem.
//
// C++ operator sequences (somewhat simplified for these notes)
//
// operator <type_name>
// operator <special characters>
//
// There must be at least one non-alphanumeric character between
// "operator" and operator name - otherwise they would be run
// together - ("operatorint" instead of "operator int")
//
#lexclass LEX_OPERATOR
#token FILLER_C1 "[\ \t]*"
<<skip();
if( isalnum(ch) ) mode(START);
>>
#token OPERATOR_STRING "[\+\-\*\/%\^\&\|\~\!\=\<\>]*"
<<mode(START);>>
#token FILLER_C2 "\(\) | \[\] "
<<mode(START);return OPERATOR_STRING;>>