Skip to content Skip to sidebar Skip to footer

Using Flex For Matching Python Multiline Strings With Escaped Characters

I wonder how to match python multiple line comments with flex. And I meet some troubles, the following works fine on Regexr, but not recognized by flex, I don't know how to fix it.

Solution 1:

You can recognize Python long strings with a single regex. It's not pretty, but I believe it works:

["]{3}(["]{0,2}([^\\"]|\\(.|\n)))*["]{3}

This is fairly similar to your original regex, but it does not attempt to limit its backslash handling to \", so that it can correctly identify \\ as a backslashed character.

A possibly easier to read (but slightly slower) solution is to use start a start condition. Here I use yymore() to create a single token which does not include the """ delimiters, but production code would probably seek to interpret Python's various backslash escapes. (It is precisely this need which motivates the use of a start condition rather than trying to recognize the entire string with a single regex.)

%x SC_LONGSTRING
%%
["]{3}     BEGIN(SC_LONGSTRING);
<SC_LONGSTRING>{
  [^\\"]+  yymore();
  \\(.|\n) yymore();
  ["]["]?  yymore();
  ["]{3}   { BEGIN(INITIAL);
             yylval.str = malloc(yyleng - 2);
             memcpy(yylval.str, yytext, yyleng - 3);
             yylval.str[yyleng - 3] = 0;
             return TOKEN_STRING;
           }
}

Post a Comment for "Using Flex For Matching Python Multiline Strings With Escaped Characters"