I need to compare 2 strings in C# and treat accented letters the same as non-accented letters. For example:
string s1 = "hello";
string s2 = "héllo";
s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
s1.Equals(s2, StringComparison.OrdinalIgnoreCase);
These 2 strings need to be the same (as far as my application is concerned), but both of these statements evaluate to false. Is there a way in C# to do this?
try this overload on the String.Compare Method.
String.Compare Method (String, String, Boolean, CultureInfo)
It produces a int value based on the compare operations including cultureinfo. the example in the page compares "Change" in en-US and en-CZ. CH in en-CZ is a single "letter".
example from the link
using System; using System.Globalization; class Sample { public static void Main() { String str1 = "change"; String str2 = "dollar"; String relation = null; relation = symbol( String.Compare(str1, str2, false, new CultureInfo("en-US")) ); Console.WriteLine("For en-US: {0} {1} {2}", str1, relation, str2); relation = symbol( String.Compare(str1, str2, false, new CultureInfo("cs-CZ")) ); Console.WriteLine("For cs-CZ: {0} {1} {2}", str1, relation, str2); } private static String symbol(int r) { String s = "="; if (r < 0) s = "<"; else if (r > 0) s = ">"; return s; } } /* This example produces the following results. For en-US: change < dollar For cs-CZ: change > dollar */
therefor for accented languages you will need to get the culture then test the strings based on that.
The following method
works on your example data. Here is the article where I got my background information: http://www.codeproject.com/KB/cs/EncodingAccents.aspxprivate static bool CompareIgnoreAccents(string s1, string s2) { return string.Compare( RemoveAccents(s1), RemoveAccents(s2), StringComparison.InvariantCultureIgnoreCase) == 0; } private static string RemoveAccents(string s) { Encoding destEncoding = Encoding.GetEncoding("iso-8859-8"); return destEncoding.GetString( Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s))); }
I think an extension method would be better:
public static string RemoveAccents(this string s) { Encoding destEncoding = Encoding.GetEncoding("iso-8859-8"); return destEncoding.GetString( Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s))); }
Then the use would be this:
if(string.Compare(s1.RemoveAccents(), s2.RemoveAccents(), true) == 0) { ...
Write a method normalize(String s) that takes in a string and turns accented letters into non accented one. Then instead of comparing string x to string y, compare normalize(string x) to normalize(string y).
Here's a function that strips diacritics from a string:
static string RemoveDiacritics(string sIn) { string sFormD = sIn.Normalize(NormalizationForm.FormD); StringBuilder sb = new StringBuilder(); foreach (char ch in sFormD) { UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch); if (uc != UnicodeCategory.NonSpacingMark) { sb.Append(ch); } } return (sb.ToString().Normalize(NormalizationForm.FormC)); }
More details here.
The principle is that is it turns 'é' into 2 successive chars 'e', acute. It then iterates through the chars and skips the diacritics.
"héllo" becomes "he<acute>llo", which in turn becomes "hello".
This line doesn't assert, which is what you want.
Post a Comment