Sunday, March 20, 2011

Regular Expression, C#

Hello there,

I need help regarding Regular Expression to find a tag/code called 'WAITPHOTO' in a document text. Depending on the presence of WAITPHOTO tag, I will be wrapping text around an Image(left/right of image).
FYI, All the tags starting with PHOTO will be replaced by Image at run-time.

I have some text that looks like:

Steps from the drive lead to a terrace with a stone balustrade and open views.  
Front door and double glazed side window lead to the: 
[PHOTOA]
This is text for PhotoA as WAITPHOTO code is followed.
[WAITPHOTO]
SITTING ROOM: 5.11m x 4.61m. Double glazed patio doors to front     
terrace, with open views across Babbacombe. Double glazed window. Television 
aerial point. One screened radiator. One double radiator. 
[PHOTOB]
This text is NOT for PhotoB as WAITPHOTO code is not followed.

Steps from the drive lead to a terrace with a stone balustrade and open views.  
Front door and double glazed side window lead to the: 
[PHOTOC]
This is text for PhotoC as WAITPHOTO code is followed.
[WAITPHOTO]

I need to find out if a particular Image tag viz. [PHOTOA] is followed by WAITPHOTO tag. Also, need to ensure that a particular WAITPHOTO tag is associated with current PHOTO tag and not with the some other following PHOTO tags.

Can anyone please guide me a regular expression to achieve above.

Thank you!

From stackoverflow
  • This regular expression worked for me: (\[PHOTO\w+\])([^\[]+)\[WAITPHOTO\]

    It groups the photo's name tag and the text applied to it.

    Here's an example of how to use it:

    string test = @"Steps from the drive lead to a terrace with a stone balustrade and open views.  Front door and double glazed side window lead to the: [PHOTOA]This is text for PhotoA as WAITPHOTO code is followed.[WAITPHOTO]SITTING ROOM: 5.11m x 4.61m. Double glazed patio doors to front     terrace, with open views across Babbacombe. Double glazed window. Television aerial point. One screened radiator. One double radiator. [PHOTOB]This text is NOT for PhotoB as WAITPHOTO code is not followed.Steps from the drive lead to a terrace with a stone balustrade and open views.  Front door and double glazed side window lead to the: [PHOTOC]This is text for PhotoC as WAITPHOTO code is followed.[WAITPHOTO]";
    string regex = @"(\[PHOTO\w+\])([^\[]+)\[WAITPHOTO\]";
    System.Text.RegularExpressions.MatchCollection mc = System.Text.RegularExpressions.Regex.Matches(test, regex);
    foreach (System.Text.RegularExpressions.Match m in mc)
    {
        System.Console.WriteLine(m.Value);
        System.Console.WriteLine("This is the photo name: " + m.Groups[1].Value);
        System.Console.WriteLine("This is the photo text: " + m.Groups[2].Value);
    }
    
    Matt Ellen : The problem with this regex is that if you have a [ in the photo text then it will not match.
  • This works for me:

    (\[PHOTO.\]).*?(?<!\1.*?\[PHOTO.\].*)\[WAITPHOTO\]
    

    Because I believe that regular expressions are often "write only":

    I try to match a [PHOTOX] tag, followed by as little text as necessary (.*?) and a [WAITPHOTO] at the end. To fulfill your need I added a negative lookbehind before the ending WAITPHOTO that says "if there's a [PHOTOX] between the starting tag and the WAITPHOTO, fail.

0 comments:

Post a Comment