Tuesday, January 11, 2011

Unable to change encoding to utf8

Hey! I am having this really strange problem. I can't convert a file to utf8 it's always in us-ascii.

I have tried this:

iconv --verbose --output=test2 -t UTF-8 test

(I have also specified the -f with iso-8859-1 and us-ascii) and when I do

file --mime-encoding test2

I get us-ascii The file contains some html and php. I really don't understand this. I have tried using notepad++ (I am sharing this folder to a win7 pc). I set the encoding to utf8 without bom, the file seems to change (the icon turns red) but when I save and check it continues to do be us-ascii.

I have checked the $LANG and it outputs en_US.UTF-8 (should I change something in the locales? I would prefer this not to anything country specific)

I have also tried recode which also didn't work.

Note: Some files are being created in utf8 (I am using eclipse and have set the project properties to encode in utf8) but for some strange reason some are not being correctly encoded. Again their contents are html and php.

Please someone help me out. I am trying to show my site in utf8 and some parts are getting messed up because of this!

Thanks!

  • If your file does only contain characters that are part of (7-bit) ASCII there is no way to tell the difference between UTF-8 & ASCII, so I'm not surprised file would say that file is ASCII then.

    If you want to serve those files as UTF-8, you better make it explicit in your HTML, PHP code, or server configuration.

    From JanC
  • Your file can be displayed fine...UTF8 only does the tricks above the ascii characters. So the first 128 chars are the same. And if you do not use any special non-ascii chars it can be identified as us-ascii (= ascii). Technically an ASCII text file and an UTF-8 with the same contents are equivalent then.

    If there are still issues on the server it might be one of its settings...

    From Lincoln

0 comments:

Post a Comment