問題描述
我必須將一些包含未編碼為 Unicode 的國家字符的數(shù)據(jù)導入到獨立的 MySQL 版本:5.0.18 在 Windows7 64 位上運行.經(jīng)過一些最初的問題,我終于讓它在 MySQL 控制臺中工作.
I have to import some data containing national characters not coded as Unicode into stand alone MySQL Version: 5.0.18 running on Windows7 64bit. After some initial problems I finally got it working in MySQL console.
但由于數(shù)據(jù)超過 50 MB,因此無法在控制臺中輸入和/或使用剪貼板.所以我創(chuàng)建腳本文件才發(fā)現(xiàn)導入后國字是亂碼.
But as the data is more than 50 MByte typing into console and or using clipboard is not possible. So I created script file only to find out that the national characters are gibberish after import.
問題是,如果我對任何文件使用 source
命令,編碼就會中斷.如果我打開同一個文件并通過剪貼板將內(nèi)容復制到控制臺,則一切正常.這里最小的 MCVE 用于測試:
The problem is that if I use source
command with any file the encoding broke. If I open the same file and copy the stuff to console by clipboard all works as should. Here smallest MCVE to test this:
DROP DATABASE IF EXISTS dbs;
CREATE DATABASE dbs;
USE dbs;
SET NAMES latin2;
DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(50) default '' );
INSERT INTO `tab` VALUES
(1,'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ'),
(2,'á???éí??ňó?????ú y?á ??éí???ó ????ú Y?');
SELECT * FROM `tab`;
當我通過剪貼板將其復制到MySQL控制臺時,輸出如下:
When I copy this to MySQL console by clipboard the output is like this:
+------+----------------------------------------+
| ix | nam |
+------+----------------------------------------+
| 1 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
| 2 | á???éí??ňó?????ú y?á ??éí???ó ????ú Y? |
+------+----------------------------------------+
2 rows in set (0.00 sec)
這是需要的.但是當我將所有這些放入 test.sql
文件并運行
Which is desired. But when I put all this into test.sql
file and run
source test.sql;
我得到了這個輸出:
+------+----------------------------------------+
| ix | nam |
+------+----------------------------------------+
| 1 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
| 2 | ?ń?′úYň??ˇ?ó°ü?˙ ?×┴ ╚¤╔═┼╝?? └ě??┌ ?? |
+------+----------------------------------------+
這顯然是錯誤的(看起來像一些默認的 MS-DOS 字符集).我認為問題不在于表或數(shù)據(jù)庫,因為這對于純文本輸出來說是一樣的:
Which is obviously wrong (looks like some default MS-DOS charset). I think the problem is not on side of table nor database as this is the same for pure text outs like:
SET NAMES latin2;
SELECT 'á???éí??ňó?????ú y?á ??éí???ó ????ú Y?' AS 'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ';
帶有剪貼板的輸出:
+----------------------------------------+
| aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+----------------------------------------+
| á???éí??ňó?????ú y?á ??éí???ó ????ú Y? |
+----------------------------------------+
和 source
文件:
+----------------------------------------+
| aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+----------------------------------------+
| ?ń?′úYň??ˇ?ó°ü?˙ ?×┴ ╚¤╔═┼╝?? └ě??┌ ?? |
+----------------------------------------+
就像從文件導入時編碼被搞砸了一樣.或者在通過鍵盤或剪貼板輸入到MySQL控制臺時,編碼發(fā)生了變化.
It is like when importing from files the encoding got screwed. Or while inputing to MySQL console through Keyboard or Clipboard the coding got changed.
那么發(fā)生了什么以及如何糾正這個問題(不丟失數(shù)據(jù))?
So what is going on and how to correct this (without the loss of data)?
- 使用
<
命令行選項代替source
沒有幫助 - 對
source
使用-e
命令行選項沒有幫助 - 使用默認字符集命令行選項沒有幫助
- 對非 Unicode 字符串使用 UTF8 會導致數(shù)據(jù)過長錯誤和數(shù)據(jù)丟失
- 剪貼板中的數(shù)據(jù)與文件中的數(shù)據(jù)相同
- using
<
command line option instead ofsource
does not help - using
-e
command line option forsource
does not help - using default charset command line option does not help
- using UTF8 for non Unicode string lead to Data too long errors and loss of data
- data in clipboard is the same as in the file
好吧,我嘗試了 MySQL 5.7.19 的較新版本,這花了我很長時間才開始,因為他們更改了初始化和內(nèi)容(wtf?那個瘋狂得到 1.8 GByte 沒有任何數(shù)據(jù)!).無論我做什么,它的行為都是一樣的.所以我嘗試使用 UTF8 編碼:
Well I tried newer version of MySQL 5.7.19 which took mi quite a while to get started as they change the initialization and things (wtf? that insanity got 1.8 GByte without any data !). It behaves the same no matter what I do. So I tried to use UTF8 encoding:
DROP DATABASE IF EXISTS dbs;
CREATE DATABASE dbs CHARACTER SET utf8 COLLATE 'utf8_unicode_ci';
USE dbs;
SET NAMES utf8;
DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(50) default '' ) CHARACTER SET utf8 COLLATE 'utf8_unicode_ci';
INSERT INTO `tab` VALUES
(1,'á???éí??ňó?????ú y?á ??éí???ó ????ú Y?'),
(2,'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ');
SELECT * FROM `tab`;
#SELECT 'á???éí??ňó?????ú y?á ??éí???ó ????ú Y?' AS 'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ';
#SHOW COLLATION;
#SHOW CHARACTER SET;
SHOW VARIABLES LIKE 'char%';
是的,腳本文件已轉(zhuǎn)換為 UTF8.現(xiàn)在這里是 my.ini 設(shè)置:
And Yes the script file is converted to UTF8. Now here is my.ini setting:
[mysql]
default-character-set=utf8
[mysqld]
skip-character-set-client-handshake
character-set-server=utf8
collation-server=utf8_unicode_ci
使用 source
此處最終為文件工作,結(jié)果如下:
This finally worked for the file using source
here the result:
+------+--------------------------------------------------------------------------+
| ix | nam |
+------+--------------------------------------------------------------------------+
| 1 | á???éí??ňó?????ú y?á ??éí???ó ????ú Y? |
| 2 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+------+--------------------------------------------------------------------------+
+--------------------------+--------------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------------------------------------------------------------------+
推薦答案
您必須在創(chuàng)建表時指定 CHARACTER SET
,最好是在列本身上.否則,您會從 SHOW VARIABLES LIKE 'char%';
You must specify a CHARACTER SET
when creating the table, preferably on the column itself. Otherwise, you get some default from SHOW VARIABLES LIKE 'char%';
SET NAMES
在客戶端建立編碼.
當 INSERTing
或 SELECTing
時,編碼從客戶端的編碼 (SET NAMES
) 更改為列的 (...VARCHAR ...字符集...
).
When INSERTing
or SELECTing
, the encoding is changed from the client's encoding (SET NAMES
) to the column's (... VARCHAR ... CHARACTER SET ...
).
你真的需要latin2嗎?世界正在轉(zhuǎn)向 UTF-8.
Do you really need latin2? The world is moving to UTF-8.
這篇關(guān)于使用 source 命令會破壞非 Unicode 文本編碼的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!