Monday, February 21, 2011

Strip style comments in string pasted from Microsoft Word with PHP

I have a text area that users typically paste content from Microsoft Word into. I am using Tiny MCE for formatting. The problem is they string that gets pasted always has style definitions that are commented out. I need a way to strip this commented stuff out of the string.

Here is an example of the comments that get added:

<!-- /* Font Definitions */ @font-face {font-family:"Courier New"; panose-1:2 7 3 9 2 2 5 2 4 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:Wingdings; panose-1:5 2 1 2 1 8 4 8 7 8; mso-font-charset:2; -->

This is just a very small chunk of it, it ussually goes on for hundreds of lines.

anyway, im using strip_tags to get rid of unwanted HTML tags and i've tried using the follow preg_replace but the style comments are always there:

$e_description = preg_replace('/<!--(.|\s)*?-->/', '',$_POST['description']);

Any suggestions on how to get rid of this junk?

Thanks.

From stackoverflow
  • Why not just add the ms modifiers (m is multi-line, s is "dot-all" where . matches all characters:

    preg_replace('/<!--.*?-->/ms', '', $_POST['description']);
    

    That MAY work for you (try it out)...

    Mikulas Dite : I rather suggest `'//ims'` since user may want to input simple comment. Even this is quite hazardous.
    Daelan : this doesn't do anything //ms and this replaces everything in the string not just the commented area '//ims' Thanks for the suggestions though.

0 comments:

Post a Comment