av超碰在线观看,www.精品,日本高清视频网站www

本文介紹了什么是“錯"使用 C++ wchar_t 和 wstrings?寬字符有哪些替代方案?的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學習吧！

問題描述

我在 C++ 社區中看到很多人(尤其是 freenode 上的##c++)對 wstrings 和 wchar_t 的使用以及它們在 windows 中的使用表示不滿接口.wchar_t 和 wstring 到底有什么錯誤"，如果我想支持國際化，有哪些寬字符的替代方案?

I have seen a lot of people in the C++ community(particularly ##c++ on freenode) resent the use of wstrings and wchar_t, and their use in the windows api. What is exactly "wrong" with wchar_t and wstring, and if I want to support internationalization, what are some alternatives to wide characters?

wchar_t 是什么?

wchar_t 的定義使得任何語言環境的 char 編碼都可以轉換為 wchar_t 表示，其中每個 wchar_t 僅表示一個代碼點:

What is wchar_t?

wchar_t is defined such that any locale's char encoding can be converted to a wchar_t representation where every wchar_t represents exactly one codepoint:

類型 wchar_t 是一個獨特的類型，其值可以代表支持的語言環境 (22.3.1) 中指定的最大擴展字符集的所有成員的不同代碼.

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.3.1).

; ; ; —C++ [basic.fundamental] 3.9.1/5

— C++ [basic.fundamental] 3.9.1/5

這不要求 wchar_t 足夠大以同時表示來自所有語言環境的任何字符.也就是說，用于 wchar_t 的編碼可能因地區而異.這意味著您不一定使用一種語言環境將字符串轉換為 wchar_t，然后使用另一種語言環境將其轉換回 char.¹

This does not require that wchar_t be large enough to represent any character from all locales simultaneously. That is, the encoding used for wchar_t may differ between locales. Which means that you cannot necessarily convert a string to wchar_t using one locale and then convert back to char using another locale.¹

由于使用 wchar_t 作為所有語言環境之間的通用表示似乎是 wchar_t 在實踐中的主要用途，因此您可能想知道它有什么好處.

Since using wchar_t as a common representation between all locales seems to be the primary use for wchar_t in practice you might wonder what it's good for if not that.

wchar_t 的最初意圖和目的是通過定義它來簡化文本處理，使其需要從字符串的代碼單元到文本字符的一對一映射，從而允許使用相同的簡單算法as 與 ascii 字符串一起用于其他語言.

The original intent and purpose of wchar_t was to make text processing simple by defining it such that it requires a one-to-one mapping from a string's code-units to the text's characters, thus allowing the use of the same simple algorithms as are used with ascii strings to work with other languages.

不幸的是，wchar_t 規范的措辭假設字符和代碼點之間存在一對一的映射來實現這一點.Unicode 打破了這個假設²，所以你也不能安全地將 wchar_t 用于簡單的文本算法.

Unfortunately the wording of wchar_t's specification assume a one-to-one mapping between characters and codepoints to achieve this. Unicode breaks that assumption², so you can't safely use wchar_t for simple text algorithms either.

這意味著便攜式軟件不能將 wchar_t 用作語言環境之間文本的通用表示，也不能使用簡單的文本算法.

This means that portable software cannot use wchar_t either as a common representation for text between locales, or to enable the use of simple text algorithms.

不管怎樣，對于可移植代碼來說并不多.如果定義了 __STDC_ISO_10646__，則 wchar_t 的值直接表示在所有語言環境中具有相同值的 Unicode 代碼點.這使得進行前面提到的區域間轉換是安全的.但是，您不能僅僅依靠它來決定您可以以這種方式使用 wchar_t，因為盡管大多數 unix 平臺都定義了它，但即使 Windows 在所有語言環境中使用相同的 wchar_t 語言環境，Windows 也不會這樣做.

Not much, for portable code anyway. If __STDC_ISO_10646__ is defined then values of wchar_t directly represent Unicode codepoints with the same values in all locales. That makes it safe to do the inter-locale conversions mentioned earlier. However you can't rely only on it to decide that you can use wchar_t this way because, while most unix platforms define it, Windows does not even though Windows uses the same wchar_t locale in all locales.

Windows 沒有定義 __STDC_ISO_10646__ 的原因是因為 Windows 使用 UTF-16 作為其 wchar_t 編碼，并且因為 UTF-16 使用代理對來表示大于 U+FFFF 的代碼點，這意味著UTF-16 不滿足 __STDC_ISO_10646__ 的要求.

The reason Windows doesn't define __STDC_ISO_10646__ is because Windows use UTF-16 as its wchar_t encoding, and because UTF-16 uses surrogate pairs to represent codepoints greater than U+FFFF, which means that UTF-16 doesn't satisfy the requirements for __STDC_ISO_10646__.

對于平臺特定的代碼 wchar_t 可能更有用.它本質上在 Windows 上是必需的(例如，某些文件在不使用 wchar_t 文件名的情況下根本無法打開)，盡管據我所知，Windows 是唯一的平臺(所以也許我們可以將 wchar_t 視為Windows_char_t").

For platform specific code wchar_t may be more useful. It's essentially required on Windows (e.g., some files simply cannot be opened without using wchar_t filenames), though Windows is the only platform where this is true as far as I know (so maybe we can think of wchar_t as 'Windows_char_t').

事后看來，wchar_t 顯然對于簡化文本處理或作為獨立于語言環境的文本的存儲沒有用處.可移植代碼不應試圖將其用于這些目的.非可移植代碼可能僅僅因為某些 API 需要它而發現它很有用.

In hindsight wchar_t is clearly not useful for simplifying text handling, or as storage for locale independent text. Portable code should not attempt to use it for these purposes. Non-portable code may find it useful simply because some API requires it.

我喜歡的替代方法是使用 UTF-8 編碼的 C 字符串，即使在對 UTF-8 不是特別友好的平臺上也是如此.

The alternative I like is to use UTF-8 encoded C strings, even on platforms not particularly friendly toward UTF-8.

通過這種方式，人們可以使用跨平臺的通用文本表示編寫可移植代碼，將標準數據類型用于其預期目的，獲得語言對這些類型的支持(例如字符串文字，盡管需要一些技巧才能使其適用于某些編譯器)、一些標準庫支持、調試器支持(可能需要更多技巧)等.使用寬字符通常更難或不可能獲得所有這些，并且您可能會在不同平臺上獲得不同的部分.

This way one can write portable code using a common text representation across platforms, use standard datatypes for their intended purpose, get the language's support for those types (e.g. string literals, though some tricks are necessary to make it work for some compilers), some standard library support, debugger support (more tricks may be necessary), etc. With wide characters it's generally harder or impossible to get all of this, and you may get different pieces on different platforms.

UTF-8 沒有提供的一件事是能夠使用簡單的文本算法，例如 ASCII.在這方面 UTF-8 并不比任何其他 Unicode 編碼差.事實上，它可能被認為更好，因為 UTF-8 中的多代碼單元表示更常見，因此與嘗試堅持使用 UTF 相比，處理此類可變寬度字符表示的代碼中的錯誤更有可能被注意到和修復-32 使用 NFC 或 NFKC.

One thing UTF-8 does not provide is the ability to use simple text algorithms such as are possible with ASCII. In this UTF-8 is no worse than any other Unicode encoding. In fact it may be considered to be better because multi-code unit representations in UTF-8 are more common and so bugs in code handling such variable width representations of characters are more likely to be noticed and fixed than if you try to stick to UTF-32 with NFC or NFKC.

許多平臺使用 UTF-8 作為其原生字符編碼，并且許多程序不需要任何重要的文本處理，因此在這些平臺上編寫國際化程序與不考慮國際化編寫代碼幾乎沒有什么不同.編寫更廣泛可移植的代碼，或在其他平臺上編寫，需要在使用其他編碼的 API 邊界插入轉換.

Many platforms use UTF-8 as their native char encoding and many programs do not require any significant text processing, and so writing an internationalized program on those platforms is little different from writing code without considering internationalization. Writing more widely portable code, or writing on other platforms requires inserting conversions at the boundaries of APIs that use other encodings.

某些軟件使用的另一種替代方法是選擇跨平臺表示，例如保存 UTF-16 數據的無符號短數組，然后提供所有庫支持并簡單地承擔語言支持等方面的成本.

Another alternative used by some software is to choose a cross-platform representation, such as unsigned short arrays holding UTF-16 data, and then to supply all the library support and simply live with the costs in language support, etc.

C++11 添加了新的寬字符作為 wchar_t、char16_t 和 char32_t 的替代品，并具有附帶的語言/庫功能.這些實際上并不能保證是 UTF-16 和 UTF-32，但我不認為任何主要實現會使用其他任何東西.C++11 還改進了 UTF-8 支持，例如使用 UTF-8 字符串文字，因此沒有必要欺騙 VC++ 生成 UTF-8 編碼的字符串(盡管我可能會繼續這樣做而不是使用 u8 前綴).

C++11 adds new kinds of wide characters as alternatives to wchar_t, char16_t and char32_t with attendant language/library features. These aren't actually guaranteed to be UTF-16 and UTF-32, but I don't imagine any major implementation will use anything else. C++11 also improves UTF-8 support, for example with UTF-8 string literals so it won't be necessary to trick VC++ into producing UTF-8 encoded strings (although I may continue to do so rather than use the u8 prefix).

TCHAR:TCHAR 用于遷移舊的 Windows 程序，這些程序采用從 char 到 wchar_t 的傳統編碼，最好忘記，除非您的程序是在前一千年編寫的.它不是可移植的，并且其編碼甚至其數據類型本質上都是不確定的，因此無法與任何基于非 TCHAR 的 API 一起使用.由于它的目的是遷移到 wchar_t，我們在上面看到這不是一個好主意，因此使用 TCHAR 沒有任何價值.

TCHAR: TCHAR is for migrating ancient Windows programs that assume legacy encodings from char to wchar_t, and is best forgotten unless your program was written in some previous millennium. It's not portable and is inherently unspecific about its encoding and even its data type, making it unusable with any non-TCHAR based API. Since its purpose is migration to wchar_t, which we've seen above isn't a good idea, there is no value whatsoever in using TCHAR.

<子>1.可以在 wchar_t 字符串中表示但在任何語言環境中都不支持的字符不需要用單個 wchar_t 值表示.這意味著 wchar_t 可以對某些字符使用可變寬度編碼，這又明顯違反了 wchar_t 的意圖.盡管 wchar_t 可表示的字符足以說明語言環境支持"該字符是有爭議的，但在這種情況下，可變寬度編碼是不合法的，并且 Window 對 UTF-16 的使用不符合規范.

_{2.Unicode 允許用多個代碼點表示許多字符，這對于簡單的文本算法與可變寬度編碼產生了相同的問題.即使嚴格維護組合規范化，某些字符仍然需要多個代碼點.請參閱:http://www.unicode.org/standard/where/}

這篇關于什么是“錯"使用 C++ wchar_t 和 wstrings?寬字符有哪些替代方案?的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網！

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題，如果有圖片或者內容侵犯了您的權益，請聯系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

什么是“錯"使用 C++ wchar_t 和 wstrings?寬字符有

問題描述

推薦答案

wchar_t 是什么?

What is wchar_t?

相關文檔推薦