問題描述
我的任務是生成 UTF-8 字符集中的所有字符,以測試系統如何處理每個字符.我對字符編碼沒有太多經驗.我要嘗試的方法是增加一個計數器,然后嘗試將那個以十為基數的數字轉換為等效的 UTF-8 字符,但到目前為止,我還沒有在 C# 3.5 中找到一種有效的方法
I have been given the task of generating all the characters in the UTF-8 character set to test how a system handles each of them. I do not have much experience with character encoding. The approaching I was going to try was to increment a counter, and then try to translate that base ten number into it's equivalent UTF-8 character, but so far I have no been able to find an effective way to to this in C# 3.5
任何建議將不勝感激.
推薦答案
System.Net.WebClient client = new System.Net.WebClient();
string definedCodePoints = client.DownloadString(
"http://unicode.org/Public/UNIDATA/UnicodeData.txt");
System.IO.StringReader reader = new System.IO.StringReader(definedCodePoints);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
while(true) {
string line = reader.ReadLine();
if(line == null) break;
int codePoint = Convert.ToInt32(line.Substring(0, line.IndexOf(";")), 16);
if(codePoint >= 0xD800 && codePoint <= 0xDFFF) {
//surrogate boundary; not valid codePoint, but listed in the document
} else {
string utf16 = char.ConvertFromUtf32(codePoint);
byte[] utf8 = encoder.GetBytes(utf16);
//TODO: something with the UTF-8-encoded character
}
}
上面的代碼應該遍歷當前分配的 Unicode 字符.您可能想要在本地解析 UnicodeData 文件并修復我遇到的任何 C# 錯誤制作.
The above code should iterate over the currently assigned Unicode characters. You'll probably want to parse the UnicodeData file locally and fix any C# blunders I've made.
當前分配的 Unicode 字符集小于可以定義的集.當然,當您打印出其中一個字符時,您是否看到一個字符取決于許多其他因素,例如字體和它在發送到您的眼球之前會通過的其他應用程序.
The set of currently assigned Unicode characters is less than the set that could be defined. Of course, whether you see a character when you print one of them out depends on a great many other factors, like fonts and the other applications it'll pass through before it is emitted to your eyeball.
這篇關于如何在 .net 中生成 UTF-8 字符集中的所有字符的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!