Fork me on GitHub

XVisitor

A tool for generating customized Antlr4 parse-trees visitors using grammar-style definitions.


Summary

XVisitor enables multiple XPath-styled probes to be evaluated in parallel against the parse-tree. When any XPath is matched, actions defined in the grammar specific to that XPath are invoked. Actions for either or both ‘onEntry’ and ‘onExit’ states can be defined.

Consistent with standard Antlr grammars, actions are implemented using target language code. Token and rule ‘$’-styled references give actions direct access to the currently matched context and token attributes.


Benefits

  • Natural order of results:
    • the XPath match actions are invoked in natural (parse-tree) order by the visitor
  • Clarity:
    • ‘onEntry:’ and ‘onExit:’ labeled actions can be defined for each XPath
    • the grammar definition is simple and conceptually similar to standard Antlr grammars
    • symbolic context references are available for use in actions
    • the actual path traversed to reach a matched XPath is recorded and available in actions
  • Flexibilty:
    • concurrent evaluation fully supports overlapping XPaths
    • supports XPath wildcards: ‘//’, ‘*‘, and ‘?’
  • Efficiency:
    • complex path state analysis can be reduced to a set of simply-defined path matches

Example Grammar

The following grammar will generate an outline listing of select nodes in a parse-tree generated using the ANTLRv4 grammar.

xvisitor grammar Outline;

options {
    parserClass = AntlrDT4Parser ;
    # superClass = OutlineAdaptor ;
}

@header {
    package net.certiv.antlrdt4.core.parser.gen;
    # import net.certiv.antlrdt4.core.parser.OutlineAdaptor;
}

outline
    : grammarSpec
    | optionsBlock
    | optionStatement
    | tokensBlock
    | tokenStatement
    | atAction
    | parserRule
    | lexerRule
    ;

grammarSpec     : /grammarSpec
                    { System.out.println($grammarType.text + " " + $identifier.text); }
                ;

optionsBlock    : //prequelConstruct/optionsSpec
                    { onEntry: System.out.println("Options: "); }   
                    { onExit:  System.out.println("End Options."); }    
                ;

optionStatement : //prequelConstruct/optionsSpec/option
                    { 
                        System.out.print($identifier.text);
                        System.out.print(" = ");
                        System.out.println($optionValue.text);
                    }   
                ;

tokensBlock     : //tokensSpec 
                    { onEntry: System.out.println("Tokens: "); }    
                    { onExit:  System.out.println("End Tokens."); } 
                ;

tokenStatement  : //tokensSpec//id
                    { System.out.println($id.text); }
                ;

atAction        : /grammarSpec/prequelConstruct/action
                    { 
                        System.out.print("@");
                        if ($COLONCOLON != null) {
                            System.out.print($actionScopeName.text + "::");
                        }
                        System.out.print($identifier.text);
                        System.out.print(" = ");
                        System.out.println($optionValue.text);
                    }
                ;

parserRule      : //ruleSpec/parserRuleSpec
                    {
                        System.out.print("Parser rule: "); 
                        System.out.println($RULE_REF.text);
                    }
                ;

lexerRule       : //lexerRuleSpec
                    {
                        System.out.print("Lexer rule: "); 
                        System.out.println($TOKEN_REF.text);
                    }
                ;

Grammar Structure

/* Header comment (optional) */
xvisitor grammar name;                  ◀ 'xvisitor' qualifier required

options {
    parserClass = ParserGrammarName;    ◀ required - grammar name of the parser used
                                        ◀ to create the parse-tree
    superClass = ASuperClassName;       ◀ optional - Antlr standard definition
}

@header  { .... }           ◀ optional - uses Antlr standard definition

# line comments start with the '#' character
mainRuleName
    : XPathRule1            ◀ for XPathRules that terminate on the same
    | XPathRule2            ◀ parse-tree node, the order of listing in
    | XPathRule3            ◀ the main rule determines the execution order
    ....                    ◀ of the associated actions
    | XPathRuleN
    { mainRuleAction(); }   ◀ optional - action runs on visitor termination
    ;


XPathRule1  : XPathSpec1                    ◀ a path specification 
                { action1(); }              ◀ unlabeled action; executes on entry
            ;

XPathRule2  : XPathSpec2                    ◀ labeled actions for XPathSpecs that 
                { onEntry: action2Beg(); }  ◀ terminate on rule context nodes; either
                { onExit:  action2End(); }  ◀ or both may be specified
            ;
....
XPathRuleN  : XPathSpecN
                { actionN(); }
            ;

Grammar Lexicon

Essentially the same as Antlr: Grammar Lexicon.

Note: line comments start with the ‘#’ character


XPath Specification

Essentially the same as Antlr standard XPaths.

The primary significant difference: every XPath specification must start with either ‘/’ or ‘//’.

XPath paths are strings representing nodes or subtrees you would like to select within a parse tree. It’s useful to collect subsets of the parse tree to process. For example you might want to know where all assignments are in a method or all variable declarations that are initialized.

A path is formed from a descendent series of node names separated by path separators.
Expression Description
nodename The symbolic name (grammar parser or lexer rule name) identifying a parse-tree node.
/ Simple separator. Alone, represents the root node. All paths start at the root; all paths must start with either this separator or the wildcard separator '//'.
// Wildcard separator. Matches any number of path node segments prior to a node that matches the next nodename; e.g., //ID finds all ID token nodes in the tree.
* Wildcard nodename. Matches all nodes at the same location in the path.
!nodename Inverted nodename. Matches any node except for nodename at the same location in the path; e.g., /classdef/!field should find all children of classdef root node that are not field nodes.
?nodename Optional nodename. Specifies an optional segment in the path.

Actions and Attributes

Essentially the same as provided by Antlr Actions and Attributes.

Note: Actions execute relative to the parse-tree node context identified by the path rule match.

Available Terminal Node Attributes
Attribute Type Description
text String  The text matched for the token; translates to getText().
type int The type of the underlying token; translates to getType().
line int The line number reported by the underlying token (1-relative); translates to getLine().
pos int The character position reported by the underlying token (0-relative); translates to getCharPositionInLine().
index int The token stream index reported by the underlying token (0-relative); translates to getTokenIndex().
channel int The channel number reported by the underlying token; translates to getChannel().
Available Rule Node Attributes
Attribute Type Description
text String  The text matched for the token; translates to getText().
line int The line number reported by the underlying token (1-relative); translates to getLine().
pos int The token stream index reported by the underlying start token (0-relative); translates to getStart().getCharPositionInLine().

Every path rule can be followed by multiple Actions, where each action is delimited by braces. Within each Action, a leading label deterines when the Action is to be executed in the process of matching the terminal node of the rule path.

Available Action Labels
Label Description
No label. Equivalent to use of an 'onEntry:' label.
onEntry:  The Action is executed on visitor entry to the node matched, i.e., prior to the visitor visiting any subnodes of the matched node.
onExit: The Action is executed on visitor exit of the node matched, i.e., following the visitor visiting any subnodes of the matched node.

Runtime

The XVisitor runtime provides two entry points for invoking the visitor.

/**
 * Invokes the visitor for all XPath rules listed
 * in the main rule of the visitor grammar.
 */
public void findAll() { .... }


/**
 * Invokes the visitor for an active subset of
 * the XPath rules listed in the main rule of the
 * visitor grammar. The argument names of XPath
 * rules define the active subset. The remaining
 * rules are inactive and are not considered in
 * the operation of the visitor.
 * 
 * @param names Names of the XPath rules to set as active
 */
public void find(String... names) { .... } 

Within Actions, the runtime provides the following context discovery methods:

/**
 * Returns a list of parse tree path nodes evaluated in the current path. 
 * By default, this is just the non-wildcard'd nodes, <i>i.e.</i>, just 
 * the nodes symbolically identified in the rule path specification.
 *   
 * If {@link Processor#keepAllPathContexts(boolean keep) is set,
 * then all path nodes are recorded, and the result will be equivalent 
 * to Processor#ancestors()}.
 */
public List<ParseTree> pathNodes() { .... } 

/**
 * Returns the current context 
 */
public ParseTree lastPathNode() { .... } 

/**
 * Returns the parent chain of parse-tree node ancestors starting with  
 * the current node and ending at the root node, inclusive.
 */
public List<ParseTree> ancestors() { .... } 

/**
 * Returns whether a parse tree node of the given ruleIndex corresponding 
 * types exist as an ancestor of the current node. The generated visitor
 * defines the literal values of the known ruleIndexes.
 */
public boolean hasAncestor(int... ruleIndexes) { .... }

Quick Start

Given:

LangParser.g4       // grammar used to generate a parse tree
LangLexer.g4
input.txt           // contains source text to parse
LangVisitor.xv      // XVisitor grammar 

Compile:

java org.antlr.v4.Tool -o .\gen -no-listener LangLexer.g4 LangParser.g4

java net.certiv.antlr.xvisitor.Tool -o .\gen -lib .\gen LangVisitor.xv
  • Note: the xvisitor generator tool requires access to the Antlr generated parser.
  • Use the ‘-lib’ command line option to specifiy the directory containing the Antlr generated parser.

Invoke:

ANTLRFileStream input = new ANTLRFileStream(input.txt);
LangLexer lexer = new LangLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
LangParser parser = new LangParser(tokens);
LangContext parseTree = parser.lang();

LangVisitor visitor = new LangVisitor(parseTree);
visitor.findAll();

Requirements

Antlr Tool & Runtime 4.5.3


License

Antlr-standard BSD License. See the accompanying License.txt file.