Saturday, 15 September 2012

java - Why does Terminals.tokenizer() tokenize unregistered operators/keywords? -



java - Why does Terminals.tokenizer() tokenize unregistered operators/keywords? -

i've discovered root cause of confusing behavior observing. here test:

@test public void test2() { terminals terminals = terminals.caseinsensitive(new string[] {}, new string[] { "true", "false" }); object result = terminals.tokenizer().parse("d"); system.out.println("result: " + result); }

this outputs:

result: d

i expecting parser returned terminals.tokenizer() not homecoming because "d" not valid keyword or operator.

the reason care because wanted own parser @ lower priority returned terminals.tokenizer():

public static final parser<?> instance = parsers.or( string_tokenizer, number_tokenizer, whitespace_tokenizer, (parser<token>)terminals.tokenizer(), identifier_tokenizer);

the identifier_tokenizer above never used because terminals.tokenizer() matches.

why terminals.tokenizer() tokenize unregistered operators/keywords? , how might around this?

from documentation of tokenizer#caseinsensitive:

org.codehaus.jparsec.terminals

public static terminals caseinsensitive(string[] ops, string[] keywords)

returns terminals object lexing , parsing operators names specified in ops, , lexing , parsing keywords case insensitively. keywords , operators lexed tokens.fragment tokens.tag.reserved tag. words not among keywords lexed fragment tokens.tag.identifier tag. word defined alphanumeric string starts [_a - za - z], 0 or more [0 - 9_a - za - z] following.

actually, result returned parser fragment object tagged according type. in case, d tagged identifier expected.

it not clear me want accomplish though. please provide test case ?

java parsing jparsec

No comments:

Post a Comment