Thursday, March 24, 2011

C# regex replace unexpected behavior

Given $displayHeight = "800";, replace whatever number is at 800 with int value y_res.

resultString = Regex.Replace(
    im_cfg_contents, 
    @"\$displayHeight[\s]*=[\s]*""(.*)"";", 
    Convert.ToString(y_res));

In Python I'd use re.sub and it would work. In .NET it replaces the whole line, not the matched group.

What is a quick fix?

From stackoverflow
  • Try this:

    resultString = Regex.Replace(
        im_cfg_contents,
        @"\$displayHeight[\s]*=[\s]*""(.*)"";",
        @"\$displayHeight = """ + Convert.ToString(y_res) + @""";");
    
    Dustin Getz : i agree, this would work, but damn that's ugly.
  • You could also try this, though I think it is a little slower than my other method:

    resultString = Regex.Replace(
        im_cfg_contents,
        "(?<=\\$displayHeight[\\s]*=[\\s]*\").*(?=\";)",
        Convert.ToString(y_res));
    
  • It replaces the whole string because you've matched the whole string - nothing about this statement tells C# to replace just the matched group, it will find and store that matched group sure, but it's still matching the whole string overall.

    You can either change your replacer to:

    @"\$displayHeight = """ + Convert.ToString(y_res) + @""";"
    

    ..or you can change your pattern to just match the digits, i.e.:

    @"[0-9]+"
    

    ..or you could see if C# regex supports lookarounds (I'm not sure if it does offhand) and change your match accordingly.

    Dustin Getz : no leading / in replacer string
    annakata : you've lost me...
    annakata : curious about the downvote here, am I wildly off base?
    Alan Moore : Well, your first suggestion looks okay (the same as Martin's first answer, in fact), but the second one will replace all digits everywhere. I believe the OP is using this regex on a much larger text.
  • Check this pattern out

    (?<=(\$displayHeight\s*=\s*"))\d+(?=";)
    

    A word about "lookaround".

  • Building on a couple of the answers already posted. The Zero-width assertion allows you to do a regular expression match without placing those characters in the match. By placing the first part of the string in a group we've separated it from the digits that you want to be replaced. Then by using a zero-width lookbehind assertion in that group we allow the regular expression to proceed as normal but omit the characters in that group in the match. Similarly, we've placed the last part of the string in a group, and used a zero-width lookahead assertion. Grouping Constructs on MSDN shows the groups as well as the assertions.

    resultString = Regex.Replace(
        im_cfg_contents, 
        @"(?<=\$displayHeight[\s]*=[\s]*"")(.*)(?="";)", 
        Convert.ToString(y_res));
    

    Another approach would be to use the following code. The modification to the regular expression is just placing the first part in a group and the last part in a group. Then in the replace string, we add back in the first and third groups. Not quite as nice as the first approach, but not quite as bad as writing out the $displayHeight part. Substitutions on MSDN shows how the $ characters work.

    resultString = Regex.Replace(
        im_cfg_contents, 
        @"(\$displayHeight[\s]*=[\s]*"")(.*)("";)", 
        "${1}" + Convert.ToString(y_res) + "${3}");
    

0 comments:

Post a Comment