c++ - How to read the identifier 'class' in Flex? -
i trying write compiler cool language , right @ lexical analysis. concretely, flex matches largest pattern understand.
thus if have in flex:
class inherits b now if token class returned next pattern:
^"class" homecoming class; for inherits token:
^"class"[ ]+[a-za-z]+[0-9]?[ ]+"inherits"[ ]+ homecoming inherits; now since flex matches largest pattern, homecoming inherits , never class. there work around problem?
i can here homecoming token class alone. how homecoming token inherits since must preceded class token , name followed string token?
but if seek impose constraints on inherits, flex match largest pattern not class alone.
then should homecoming enums/number class identifier individually? , if that, how identify 'inherits' identifier?
edit:
class inherits b { main(): self_type{...} } how flex match against main? reflexer differentiates between typeid a , main, declares objectid. can looking ahead @ paranthesis , if finds (, declares objectid. if that, counter same problem above: flex never match against ( main(.
you trying much in flex, , perhaps misunderstand role , boundaries of lexical phase. shouldn't attempting parse whole sentence flex regex alone. flex's job consume stream of text, , convert stream of integer tokens. sentence you've provided:
class inherits b represents multiple tokens language requires parsing. flex not parser, lexical scanner/tokenizer. (technically parser of bytes or characters, want "parse" atomic units represent words of language, not characters).
so there 4 distinct tokens (atomic units), known terminals in above sentence: [class, a, inherits, b]. need identifier rule flex, such doesn't match token, falls through identifier, tokens returned flex parser are:
class identifier inherits identifier the job flex parse each word / token , convert text distinct integer values consumed bison or other parser.
you typically have yacc/bison bnf grammar handle:
class_decl: class identifier | class identifier inherits identifier ; so lex rule thus, , need homecoming identifier token parser, while attaching actual symbol (a, b). yytext variable:
letter [a-za-z_] digit [0-9] letterdigit [a-za-z0-9_] %% "class" return(class); "inherits" return(inherits); {letter}{letterdigit}* { yylval.sym = new symbol(yytext); yylval.sym->line = line; fprintf(stderr, "token identifier(%s)\n", yytext); return(identifier); } if trying of within flex, possible, end mess, if seek parse html regex... :)
c++ regex compiler-construction flex-lexer lexical-analysis
No comments:
Post a Comment