c++ - How to read the identifier 'class' in Flex? -
i trying write compiler cool language , right @ lexical analysis. concretely, flex matches largest pattern understand.
thus if have in flex:
class inherits b
now if token class
returned next pattern:
^"class" homecoming class;
for inherits
token:
^"class"[ ]+[a-za-z]+[0-9]?[ ]+"inherits"[ ]+ homecoming inherits;
now since flex matches largest pattern, homecoming inherits
, never class. there work around problem?
i can here homecoming token class
alone. how homecoming token inherits
since must preceded class
token , name followed string token?
but if seek impose constraints on inherits
, flex match largest pattern not class alone.
then should homecoming enums/number class identifier individually? , if that, how identify 'inherits' identifier?
edit:
class inherits b { main(): self_type{...} }
how flex match against main
? reflexer differentiates between typeid a
, main
, declares objectid
. can looking ahead @ paranthesis , if finds (
, declares objectid. if that, counter same problem above: flex never match against (
main(
.
you trying much in flex, , perhaps misunderstand role , boundaries of lexical phase. shouldn't attempting parse whole sentence flex regex alone. flex's job consume stream of text, , convert stream of integer tokens. sentence you've provided:
class inherits b
represents multiple tokens language requires parsing. flex not parser, lexical scanner/tokenizer. (technically parser of bytes or characters, want "parse" atomic units represent words of language, not characters).
so there 4 distinct tokens (atomic units), known terminals in above sentence: [class, a, inherits, b]. need identifier rule flex, such doesn't match token, falls through identifier, tokens returned flex parser are:
class identifier inherits identifier
the job flex parse each word / token , convert text distinct integer values consumed bison or other parser.
you typically have yacc/bison bnf grammar handle:
class_decl: class identifier | class identifier inherits identifier ;
so lex rule thus, , need homecoming identifier token parser, while attaching actual symbol (a, b). yytext variable:
letter [a-za-z_] digit [0-9] letterdigit [a-za-z0-9_] %% "class" return(class); "inherits" return(inherits); {letter}{letterdigit}* { yylval.sym = new symbol(yytext); yylval.sym->line = line; fprintf(stderr, "token identifier(%s)\n", yytext); return(identifier); }
if trying of within flex, possible, end mess, if seek parse html regex... :)
c++ regex compiler-construction flex-lexer lexical-analysis
No comments:
Post a Comment