Wednesday, 15 May 2013

parsing - Preserving comments in `Text.Parsec.Token` tokenizers -



parsing - Preserving comments in `Text.Parsec.Token` tokenizers -

i'm writing source-to-source transformation using parsec, have languagedef language , build tokenparser using text.parsec.token.maketokenparser:

mylanguage = languagedef { ... commentstart = "/*" , commentend = "*/" ... } -- defines 'stringliteral', 'identifier', etc... tokenparser {..} = maketokenparser mylanguage

unfortunately since defined commentstart , commentend, each of parser combinators in tokenparser lexeme parser implemented in terms of whitespace, , whitespace eats spaces comments.

what right way preserve comments in situation?

approaches can think of:

don't define commentstart , commentend. wrap each of lexeme parsers in combinator grabs comments before parsing each token. implement own version of maketokenparser (or perhaps utilize library generalizes text.parsec.token; if so, library?)

what's done thing in situation?

in principle, defining commentstart , commentend don't fit preserving comments, because need consider comments valid parts of both source , target language, including them in grammar , ast/adt.

in way, you'd able maintain text of comment payload info of comment constructor, , output appropriately in target language, like

data statement = comment string | homecoming look | ......

the fact neither source nor target language sees comment text relevant irrelevant translation code.

major problem approach: doesn't fit maketokenparser, , fits improve implementing source language's parser ground up.

i guess i'm veering towards editing maketokenparser comment parsers homecoming string instead of ().

parsing haskell comments parsec code-translation

No comments:

Post a Comment