Thursday, May 5, 2011

Strip Html from Text in JavaScript except p tags?

I need to change RichEditor and TextEditor modes with JavaScript, now I need to convert Html to Text which is actually still in Html editor mode, so I need just p tags, but other Html can be stripped off.

From stackoverflow
  • This should help

    var html = '<img src=""><p>content</p><span style="color: red">content</span>';
    html.replace(/<(?!\s*\/?\s*p\b)[^>]*>/gi,'')
    

    explanation for my regex:

    replace all parts

    1. beginning with "<",
    2. not followed by (?!
      • any number of white-space characters "\s*"
      • optional "/" character
      • and tag name followed by a word boundary (here "p\b")
    3. containing any characters not equal ">" - [^>]*
    4. and ending with ">" character
    Tomalak : +1 for thinking about the white space.
  • Regex replace (globally, case-insensitively):

    </?(?:(?!p\b)[^>])*>
    

    with the empty string.

    Explanation:

    <          # "<"
    /?         # optional "/" 
    (?:        # don't capture group
      (?!      # a position not followed by...
        p\b    # "p" and a word bounday
      )
      [^>]*    # any char but ">"
    )*         # as often as possible
    >          # ">"
    

    This is one of the few situations where applying regex to HTML can actually work.

    Some might object and say that the use of a literal "<" within an attribute value was actually not forbidden, and therefore would potentially break the above regex. They would be right.

    The regex would break in this situation, replacing the underlined part:

    <p class="foo" title="unusual < title">
                                  ---------
    

    If such a thing is possible with your input, then you might have to use a more advanced tool to do the job - a parser.

    Pim Jager : It's great that you added the explanation.
    Steerpike : Yeah, seconded on the explanation breakdown. Thanks for the clarity.
    Tomalak : @JavaCoder: Is this a "give me teh codez" question? There are literally *tons* of JavaScript regex tutorials available on the internet. I am sure you manage to find one that tells you how to accomplish a replace, if only you went looking for one. Even Rafael's answer below shows you how to do it. But if you are asking how to make a JavaScript function, you might not yet be ready to use regular expressions at all. And your nickname would be wrong.

0 comments:

Post a Comment