Special Character Conversion Problems – ISO-8859-1 to Unicode
Special Character Conversion Problems ISO-8859-1 to Unicode
Ever see one of these funny characters or (4-digits in a box) .
The black question mark appears when the value for a special character doesn't match a character in the character set used for displaying the text. I found this happened when working with text data that was encoded in ISO-8859-1 and displayed in UTF-8. This is a correct problem as character sets need to be converted before you can display them.
Conversion Problems
But what do you do when you convert the data from the incoming encoding to the outgoing encoding and end up with or (4-digits in a box) still.
Well that was the problem I encountered.
For about a week I banged my head trying to solve that character conversion problem in .NET project. I received data in ISO-8859-1 format and displayed it in UTF-8. The problem occurred when the special character bullet () was in the data received. It showed up as because I was not doing a conversion to UTF-8. So I did some research and found
public static string iso8859ToUnicode(string textToConvert) { Encoding iso8859 = Encoding.GetEncoding("iso-8859-1"); Encoding unicode = Encoding.Unicode; byte[] srcTextBytes = iso8859.GetBytes(textToConvert); byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes); char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)]; unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0); return destChars.ToString(); }
After I used this function and displayed the text, I found that the bullet was then converted to u0095 which was displayed as a box with 0095 in it. I thought that it did not convert correctly and I searched Google for u0095 and I kept getting references to Unicode. So I started to suspect that the conversion was incorrect. I came across Bullet - Unicode Character - FileFormat which listed the conversion chart for a bullet and the correct Unicode character is u2022. Obviously this is not correct so I wondered if the conversion was broken. I researched a little more and found Message Waiting - Unicode Character - FileFormat which is the u0095 character.
So I have converted successfully from ISO-8859-1 to Unicode but when displayed in a browser with UTF-8 it doesn't seem to recognize that character so I end up with the box and four digits in it.
How To Get the Browser To Display The Special Unicode Characters
As I examined the chart at FileFormat for Message Waiting and it indicated that () is the HTML entity for the Message Waiting Dot. So I looked for how to convert Unicode to html entities in .NET. The method to use is:
string html = Server.HTMLEncode(str);
But this didn't solve my problem. HTMLEncode only converted special characters below 127 in the ASCII table. My research led me to a post about expanding the HTMLEncode to include special characters above 127. Apparently the integer value of the Unicode character is also the HTML entity number. So appending to the integer value followed by a semi-colon is the HTML entity for that Unicode character. Example: ().
The code for the special character conversion is:
StringBuilder result = new StringBuilder(textToConvert.Length + (int)(textToConvert.Length * 0.1)); foreach (char c in destChars) { int value = Convert.ToInt32(c); if (value > 127) result.AppendFormat("{0};", value); else result.Append(c); } string html = result.ToString();
The Final Conversion Method
I put the ISO-8859-1 conversion to Unicode together with the special character conversion to make sure the data will display in the browser. The entire method is:
public static string iso8859ToUnicode(string textToConvert) { Encoding iso8859 = Encoding.GetEncoding("iso-8859-1"); Encoding unicode = Encoding.Unicode; byte[] srcTextBytes = iso8859.GetBytes(textToConvert); byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes); char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)]; unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0); StringBuilder result = new StringBuilder(textToConvert.Length + (int)(textToConvert.Length * 0.1)); foreach (char c in destChars) { int value = Convert.ToInt32(c); if (value > 127) result.AppendFormat("{0};", value); else result.Append(c); } return result.ToString(); }
How To Decode The Correct Information From Your Carproof Report? The Da Vinci Code, Achievement of Creative Economy, A Wrist Watch a Sold $20000 The Guru Code by David Saba The Guru Code - Success Stories - Does it Work? Code Signs For Bipolar Disorder Patients: What You Need To Know Actron CP9125 PocketScan Code Reader Massive Discounts Through Coupon Codes Driver Robot Key Code 99224, 99225, 99226 Solve 'middle Day' Code Confusion The Mandatory Zip Code System Usa Code Number - Establish A Virtual Presence In The United States Pc Codec Pack Removal - How To Remove Pc Codec Pack From Pc Completely Hostgator Discount Code