How do I get the MySQL command line tools to display Unicode correctly?
I am using a Python program to write text containing Unicode characters to a MySQL database. For example, two of the characters are
u'\u2640' a symbol for Venus or female
u'\u2642' a symbol for Mars or male
I use utf8mb4 for almost all MySQL-related character sets. Here is an excerpt from /etc/mysql/my.cnf
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
default-character-set=utf8mb4
character-set-server =utf8mb4
character_set_system =utf8mb4
Additionally, all tables are created with the following parameters:
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
In all but one, the handling of Unicode works just fine. I can write Unicode to a database table, read it, display it, etc. without any problems. The exception is mysql, which is the MySQL command line tool. When I execute a SELECT statement to view rows in a table containing Venus and Mars Unicode characters, here is what I see on the screen:
| Venus | ♀ |
| Mars | ♂ |
What I should see in the right column are the standard glyphs for Venus and Mars.
Any ideas on how to get the MySQL command line tools to display Unicode correctly?
edit:
I've done a lot of research on various MySQL system variables etc, and now I realize that the my.cnf setup shown above has some serious issues. In fact, the server mysqld will not start with the settings shown. To correct the problem, remove these from [mysqld]:
default-character-set=utf8mb4
character-set-system=utf8mb4
I'm not sure what the [client] option does, but it doesn't seem to hurt.
In Python u'\u2640' represents a single Unicode character, "♀". This compiles to three bytes containing the hex value E29980. I have no problems encoding and decoding Unicode. The correct values are stored in the MySQL table; they are correctly read from the table, and when displayed by a Python program, they appear as follows:
♀ Venus
♂ Mars
Program output can be redirected to a file, processed by a text editor, etc., and the correct Unicode symbols are displayed in all cases.
There is only one place where the correct Unicode symbols are not displayed, and that is when I use the MySQL command line tools. When I issue a SELECT statement on a table that contains Unicode symbols, I get the garbage shown above. This is not a Windows specific issue. I had the exact same problem when I ran the MySQL command line tool on Windows, Mac OS X and Ubuntu.
I'm a little embarrassed to report that the MySQL command line tools have never had a problem displaying Unicode characters. Why do I think there is?
I have written a number of Python 2 programs using MySQLdb to communicate with MySQL. My data involves Unicode characters, such as the symbols for Mars and Venus. I am able to write these Unicode characters to the database, read them back, and generally operate on them like any other character.
One annoyance: Using the MySQL command line tools, when I select rows from a table that contains symbols like Mars and Venus, I only see garbage. That's what brought me to my original post asking how to get Unicode to display correctly. I have never gotten a satisfactory answer.
Recently I started converting Python 2 programs to Python 3, using pymysql to communicate with MySQL. Immediately, I ran into a problem. The Unicode characters I read from the database seem to be all wrong. Investigation showed that, in fact, the bytes stored in the database (created with Python 2) did not form the correct utf8 sequence for the Unicode characters I was using.
I converted the Python 2 program that created the table to Python 3, recreated the table, and, presto, changeo, everything worked. In other words, the characters in the database were wrong from day one, but when read by a Python 2 program, the original Unicode characters were correctly recreated.
Of course, all of a sudden, the MySQL command line tools started displaying Unicode characters normally. The problem is that the bytes in the database created by Python 2 and MySQLdb are not the correct utf8 representation of the characters I'm storing. I don't know what the exact bytes are, and I've been dealing with this problem too long to take the time to find out.
I recommend this article to anyone using Unicode in MySQL . It shows all the MySQL parameters that must be set for Unicode, and shows how to view the parameters in your own MySQL installation.