SoFunction
Updated on 2025-03-04

MySQL insert Solve the solution to the problem of garbled code after recording

Problem phenomenon

The background application uses the MySQL service on the cloud, and needs to insert some data in the MySQL data table. Usually, you run a MySQL pod first:

kubectl run mysql-client --rm -it --restart='Never' --image mysql:5.7 --command -- env LANG=-8 mysql - -uusername -ppassword

Then copy the insert SQL file into the MySQL pod and execute the source SQL file command in the MySQL pod. When inserting the data, I found that there was already running MySQL pod in the environment, so I copied the SQL file to the running MySQL pod, then entered the MySQL pod through the command, and then connected to MySQL on the cloud:

kubectl exec -it mysql-client -- bash
mysql - -uusername -ppassword

Then execute the source SQL file, and then view the entered data through the front-end page, and find that it is garbled. However, the data entered by the MySQL client select query that executes the source command is the expected Chinese characters.

Cause of the problem

For example, the SQL statement in the SQL file executed by source is

INSERT INTO table_1 (title) VALUES ('good');

The SQL file is UTF8-encoded, and the "good" UTF8-encoded byte sequence of title field value sent by the MySQL client to the MySQL server, and the hexadecimal representation is E5A5BD.

The character set of mysql-client pod is POSIX. The MySQL client uses latin1 encoding to send data to the MySQL server. After receiving the data, the MySQL server uses latin1 to decode E5A5BD to obtain the string 好.

root@mysql-client:/# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
mysql> show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)

Since the character set of the data table field is UTF8, the MySQL server then encodes the string 好 with UTF8 to obtain the byte sequence C3A5C2A5C2BD. This can be verified by querying the following SQL statement.

select HEX(title) from table_1;

Here are some questions encountered when verifying this process:

The Chinese character "I" I used first for verification. The corresponding UTF8 encoding is e68891, 88 and 91 (located between 80 and 9f) in latin1 encoding corresponds to control characters. The manually decoded characters are not normal characters. When using UTF8 encoding, it is C3A6C288C291, which is different from the C3A6CB86E28098 stored in the data table (the encoding in MySQL code must have correctly encoded the control characters). In order to avoid control characters, I thought of choosing the Chinese character "good" E5A5BD, so that after manual encoding and the data table, I stored C3A5C2A5C2BD, which verified this process.

Reasons for garbled code query on page:

The front-end page queries data by calling the back-end interface. The character set used by the back-end service to connect to MySQL is UTF8, so character_set_results is UTF8.

The byte sequence that the MySQL server queries from the data table is C3A5C2A5C2BD, and the encoding of the data table field is also UTF8. Like character_set_results, the byte sequence sent to the background service client is C3A5C2A5C2BD.

The background service uses UTF8 to decode C3A5C2A5C2BD to get å½, so the front-end page shows å½ instead of the expected Chinese character "good".

The reason why the MySQL command line client select query is normal:

The character_set_results of the MySQL command line client session is latin1 .

The byte sequence that the MySQL server queries from the data table is C3A5C2A5C2BD, and is decoded using UTF8 and is 好.

Then use character_set_results' character set latin1 to encode E5A5BD, and send the character sequence E5A5BD to the MySQL command line client.

Then send it to the terminal emulator MobaXterm, which is the local graphical interface. The character set used by MobaXterm is UTF8, and the Chinese character "good" is used to decode and output E5A5BD.

Solution

When connecting to MySQL on the cloud, specify the character set as utf8

mysql - -uusername -ppassword --default-character-set=utf8

Set the character encoding of the MySQL pod to UTF8, so that the character set used by the MySQL client when connecting to the server is utf8

export LANG=-8

Or directly execute the source command in the MySQL client initiated by the following command. This command sets the character encoding of the pod to UTF8 through env LANG=-8:

kubectl run mysql-client --rm -it --restart='Never' --image mysql:5.7 --command -- env LANG=-8 mysql - -uusername -ppassword

In this way, MySQL's character_set_client, character_set_connection, and character_set_results will be set to utf8, which will be consistent with the character set of the data table field and will not have garbled problems.

mysql> show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Character set processing in MySQL client and server communication

The process of sending messages to the server by the client

  • If the --default-character-set parameter is not specified in the mysql command, the client uses the operating system character set to encode the message to the server, otherwise the --default-character-set parameter is used to encode the message.
  • The server sets character_set_client, character_set_connection, character_set_results as the client's character set.
  • After receiving the client's message, decode the message using the character_set_client character set.
  • Then use the character set corresponding to character_set_connection to encode the decoded message and post-process it.

When the server processes messages, it needs to be converted to the character_set_connection character set for processing. Only the comparison rules are available in connection, and neither character_set_client nor character_set_results:

mysql> show variables like 'collation_%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | utf8_general_ci   |
| collation_server     | utf8_general_ci   |
+----------------------+-------------------+
3 rows in set (0.00 sec)

The process of sending messages to the client

  • The server querys field content from the data table
  • The character content is first decoded using the character set of the field, and then encoded using the character_set_results character set and sent to the client.
  • The client uses the character set of the operating system to decode the message. Here, for the scenario where the terminal emulator using the local graphical interface to log in to the remote host, the message will also be sent to the terminal emulator of the local graphical interface, and the message will be decoded and displayed using the character set of the terminal emulator.

This is the article about the analysis of the problem of garbled code after the MySQL insert record. For more related mysql insert query, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!