Saturday, 15 May 2010

parsing - Unable to parse string with symbols with ANTLR 4 -



parsing - Unable to parse string with symbols with ANTLR 4 -

i customised simple look grammar found in "the definitive antlr 4 reference" book. new grammar following:

grammar expr; prog: stat+ ; stat: expr newline # printexpr | id '=' expr newline # assign | newline # blank ; expr: expr op=('*'|'/') expr # muldiv | expr op=('+'|'-') expr # addsub | int # int | id # id | '(' expr ')' # parens | 'min' '(' expr ',' expr ')' # min | 'max' '(' expr ',' expr ')' # max | 'len' '(' string_constant ')' # len ; mul : '*' ; // assigns token name '*' used above in grammar div : '/' ; add together : '+' ; sub : '-' ; id : [a-za-z]+ ; // match identifiers int : [0-9]+ ; // match integers newline:'\r'? '\n' ; // homecoming newlines parser (is end-statement signal) ws : [ \t]+ -> skip ; // toss out whitespace string_constant : '"' (esc | ~('"' | '\\') )* '"' ; esc : '\\' (["\\/bfnrt] | unicode) ; fragment unicode : 'u' hex hex hex hex ; fragment hex : [0-9a-fa-f] ;

there couple of new math functions (min , max) , string length. here there tree visitor used parse:

public class evalvisitor extends exprbasevisitor<integer> { /** "memory" our calculator; variable/value pairs go here */ map<string, integer> memory = new hashmap<string, integer>(); /** id '=' expr newline */ @override public integer visitassign(exprparser.assigncontext ctx) { string id = ctx.id().gettext(); // id left-hand side of '=' int value = visit(ctx.expr()); // compute value of look on right memory.put(id, value); // store in our memory homecoming value; } /** expr newline */ @override public integer visitprintexpr(exprparser.printexprcontext ctx) { integer value = visit(ctx.expr()); // evaluate expr kid system.out.println(value); // print result homecoming 0; // homecoming dummy value } /** int */ @override public integer visitint(exprparser.intcontext ctx) { homecoming integer.valueof(ctx.int().gettext()); } /** id */ @override public integer visitid(exprparser.idcontext ctx) { string id = ctx.id().gettext(); if ( memory.containskey(id) ) homecoming memory.get(id); homecoming 0; } /** expr op=('*'|'/') expr */ @override public integer visitmuldiv(exprparser.muldivcontext ctx) { int left = visit(ctx.expr(0)); // value of left subexpression int right = visit(ctx.expr(1)); // value of right subexpression if ( ctx.op.gettype() == exprparser.mul ) homecoming left * right; homecoming left / right; // must div } /** expr op=('+'|'-') expr */ @override public integer visitaddsub(exprparser.addsubcontext ctx) { int left = visit(ctx.expr(0)); // value of left subexpression int right = visit(ctx.expr(1)); // value of right subexpression if ( ctx.op.gettype() == exprparser.add ) homecoming left + right; homecoming left - right; // must sub } /** '(' expr ')' */ @override public integer visitparens(exprparser.parenscontext ctx) { homecoming visit(ctx.expr()); } /** 'min' '(' expr ',' expr ')' */ @override public integer visitmin(@notnull exprparser.mincontext ctx) { int left = visit(ctx.expr(0)); // value of left subexpression int right = visit(ctx.expr(1)); // value of right subexpression homecoming math.min(left, right); } /** 'max' '(' expr ',' expr ')' */ @override public integer visitmax(@notnull exprparser.maxcontext ctx) { int left = visit(ctx.expr(0)); // value of left subexpression int right = visit(ctx.expr(1)); // value of right subexpression homecoming math.max(left, right); } /** 'len' '(' string_constant ')' */ @override public integer visitlen(@notnull exprparser.lencontext ctx) { string str = ctx.string_constant().gettext(); homecoming str.length()-2; } }

the visitor can parse expression:

len("hello")

but not able parse expression:

len("hello%")

the message obtained following:

> java -jar calc.jar len("hello") len("hello%") line 2:10 token recognition error at: '%' 5 5

is there error in string_constant definition?

regards jona(than)

string_constant should lexer rule instead:

string_constant : '"' ( esc | ~('"' | '\\') )* '"' ;

and esc fragment:

fragment esc : '\\' (["\\/bfnrt] | unicode) ;

parsing antlr antlr4

No comments:

Post a Comment