正确的改变table字符集的语句是: alter table xxx convert to character set utf8; 而不是想当然的: alter table xxx default charset utf8;
字符集从GBK转成utf8, 会增大字段所占用的空间,有可能会改变字段的类型: 比如text有可能会自动变成medium text 但是varchar没有自动变成medium text.
下面看测试:
建一张GBK的表: mysql> SHOW CREATE TABLE xxx; CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(255) NOT NULL DEFAULT '', `body` text, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=gbk 用想当然的办法把它转成utf8: mysql> ALTER TABLE xxx DEFAULT charset=utf8; Query OK, 0 ROWS affected (0.13 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> mysql> SET names utf8; Query OK, 0 ROWS affected (0.00 sec) 插入测试数据: mysql> INSERT INTO xxx SET title='我爱北京天安门'; Query OK, 1 ROW affected (0.06 sec) 成功了。。。然后您就认为万事OK了? mysql> INSERT INTO xxx SET title='㤇'; Query OK, 1 ROW affected, 1 warning (0.03 sec) Warning (Code 1366): Incorrect string VALUE: '\xE3\xA4\x87' FOR COLUMN 'title' at ROW 1 注意 title和body字段的CHARACTER SET 仍然为gbk mysql> SHOW CREATE TABLE xxx; | xxx | CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(255) CHARACTER SET gbk NOT NULL DEFAULT '', `body` text CHARACTER SET gbk, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 | 正确的做法: mysql> DROP TABLE xxx; Query OK, 0 ROWS affected (0.02 sec) mysql> CREATE TABLE `xxx` ( -> `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, -> `title` VARCHAR(255) NOT NULL DEFAULT '', -> `body` text, -> PRIMARY KEY (`id`) -> ) ENGINE=InnoDB DEFAULT CHARSET=gbk; Query OK, 0 ROWS affected (0.06 sec) mysql> ALTER TABLE xxx CONVERT TO CHARACTER SET utf8; Query OK, 0 ROWS affected (0.14 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> INSERT INTO xxx SET title='㤇'; Query OK, 1 ROW affected (0.04 sec) mysql> SHOW CREATE TABLE xxx; | xxx | CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(255) NOT NULL DEFAULT '', `body` mediumtext, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 可是你注意到 body字段从text变成 mediumtext了吗? text字段,最多存储65535字节,换成GBK的字符就是32767个字符,这32767个gbk字符转成utf8却要占用98301字节,已经超过text的存储能力,所以被自动转成了mediumtext。 下面测试VARCHAR的情况: VARCHAR除了数据部分,还有1-2个字节用来保存数据的长度。如果只使用一个字节,那么长度上限为255(2^8-1),如果使用二个字节,长度上限为65535(2^16-1)。 所以VARCHAR最多存储65535字节,换成GBK字符为32767个: mysql> DROP TABLE xxx; Query OK, 0 ROWS affected (0.02 sec) mysql> mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32768) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk; ERROR 1074 (42000): COLUMN LENGTH too big FOR COLUMN 'title' (MAX = 32767); USE BLOB OR TEXT instead mysql> mysql> 可是 32767也是不成的。。还有每行记录的总长度限制(不包括text和BLOB字段) 65535: mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32767) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk; ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32766) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk; ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32765) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk; ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32764) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk; Query OK, 0 ROWS affected (0.06 sec) 32764 * 2 + 2 + 4 = 65534 最接近于65535了, 32765 * 2 + 2 + 4 = 65536 超过65535。 *2是因为gbk字符占用2字节。 +2是VARCHAR还需要额外2字节保存数据的长度。 +4是id字段INT UNSIGNED占了4字节。 合理吗? 转换字符集为utf8之后,VARCHAR(32764) 已经不足以保存 32764个utf8字符: mysql> ALTER TABLE xxx CONVERT TO CHARACTER SET utf8; ERROR 1074 (42000): COLUMN LENGTH too big FOR COLUMN 'title' (MAX = 21845); USE BLOB OR TEXT instead 21845 * 3 = 65535.
仓促完成,难免有错,欢迎指出。
查看更多关于mysql 改变table的字符集的详细内容...