27 March 2000 Release 2.22 Notes for New Users of PCCTS Version 1.33MR22
45
#196.
Attributes created in a rule should be assumed
not
valid on entry to a fail action
Fail action are "... executed after a syntax error is detected but before a message is printed and the attributes have
been destroyed. However, attributes are not valid here because one does not know at what point the error occurred
and which attributes even exist. Fail actions are often useful for cleaning up data structures or freeing memory."
(Page 29 of 1.00 manual)
Example of a fail action:
a : <<List *p=NULL;>>
( v:Var <<append(p,$v);>> )+
<<operateOn(p);rmlist(p);>>
; <<rmlist(p);>>
^^^^^^^^^^^^^^ <--- Fail Action
#197.
Use a fail action to destroy temporary attributes when a rule fails
If you construct temporary, local, attributes in the middle of the recognition of a rule, remember to deallocate the
structure should the rule fail. The code for failure goes after the ";" and before the next rule. For this reason it is
sometimes desirable to defer some processing until the rule is recognized rather than the most convenient place:
#include "pccts/h/charptr.h"
;statement!
: <<char *label=0;>>
{name:ID COLON <<label=MYstrdup($name);>> }
s:statement_without_label
<<#0=#(#[T_statement,label],#s);
if (label!=0) free(label);
>>
;<<if (label !=0) free(label);>>
In the above example attributes are handled by charptr.* (see the warning, Item #195). The call to MYstrdup() is
necessary because $name will go out of scope at the end of the subrule "{name:ID COLON}". The routine written
to construct ASTs from attributes (invoked by
#[int,char *]
) knows about this behavior and always makes a
copy of the character string when it constructs the AST. This makes the copy created by the explicit call to
MyStrdup redundant once the AST has been constructed. If the call to "statement_without_label" fails then the
temporary copy must be deallocated.
#198.
When you need more information for a token than just token type, text, and line number
Passing accurate column information along with the token in C mode when using syntactic predicates requires
workarounds. P.A. Keller (keller@ebi.ac.uk) has worked around this limitation of C mode by passing the address of
a user-defined struct (rendered as text using format codes "%p" or "%x") along with (or instead) of the token's
actual text. This requires changes in syntax error routines and other places where the token text might be displayed.
#199.
About the pipeline between
DLG
and
ANTLR
(C mode)
I find it helpful to think of lexical processing by
DLG
as a process which fills a pipeline and of
ANTLR
as a process
which empties a pipeline. (This relationship is exposed in C++ mode because of the
ANTLR
TokenBuffer class).
With LL_K=1 the pipeline is only one item deep, trivial, and invisible. It is invisible because one can make a
decision in
ANTLR
to change the
DLG
#lexclass with zzmode() and have the next token (the one following the one
just parsed by
ANTLR
) parsed according to the new #lexclass.
With LL_K>1 the pipeline is not invisible.
DLG
will put a number of tokens into the pipeline and
ANTLR
will
analyze them in the same order. How many tokens are in the pipeline depends on options one has chosen.
Case 1: Infinite lookahead mode ("
(...)?
"). The pipeline is as huge as the input since the entire input is
tokenized by
DLG
before
ANTLR
even begins analysis.
Case 2: Demand lookahead (interactive mode). There is a varying amount of lookahead depending on how much
ANTLR
thinks it needs to predict which rule to execute next. This may be zero tokens (or maybe it's one token)
up to
k
tokens. Naturally, it takes extra work by
ANTLR
to keep track of how many tokens are in the pipe and how
many are needed to parse the next rule.
Case 3: Normal mode.
DLG
stays exactly
k
tokens ahead of
ANTLR
. This is a half-truth. It rounds
k
up to the next
power of 2 so that with k=3 it actually has a pipeline of 4 tokens. If one says "-k 3" the analysis is still
k
=3, but