Analysis and solution to the problem of garbled Chinese code reading in C language

introduction

In C programming, file operation is one of the common tasks. However, when reading text files containing Chinese, developers often encounter problems such as "hot" garbled code or Chinese display exceptions. These problems usually stem from buffer not initialized, file encoding mismatch, terminal display encoding inconsistent.

This article will analyze the root causes of these problems in depth and provide complete solutions including code examples, coding tuning methods, and cross-platform compatibility suggestions.

1. Problem phenomenon: Why does "calm" garbled code appear?

1.1 The source of "Hot"

In Visual Studio's Debug mode, uninitialized stack memory will be filled with 0xCC. When these bytes are interpreted as GBK encoding, 0xCCCC corresponds to the Chinese character "Song", so the uninitialized char array will be displayed as "Song...".

Sample code (problem reappears):

#include &lt;&gt;

int main() {
    char buffer[100];  // Not initialized    printf("%s\n", buffer);  // It may output "Hot..."    return 0;
}

Cause analysis:

bufferUninitialized, the memory content is random (filled in Debug mode0xCC）。
whenprintfTry to use string (%s) When outputting, it will be read all the time to\0End, and0xCCDecoded by GBK as "hot".

1.2 Solution: Initialize the buffer

char buffer[100] = {0};  // Initialize to all 0// Or use memsetmemset(buffer, 0, sizeof(buffer));

This way, the buffer will be cleared to avoid outputting undefined content.

2. Analysis and solution of Chinese garbled problems

Even if the "hot" problem is solved, garbled code may still appear when reading Chinese, the main reasons include:

2.1 File encoding does not match terminal encoding

UTF-8: It is recommended for modern operating systems, with one Chinese character accounting for 3 bytes.
GBK: Windows default encoding, one Chinese character accounts for 2 bytes.

If the file is UTF-8, but the console uses GBK by default, it will cause garbled code.

Example (garbled after UTF-8 file reading):

File content（UTF-8）："Hello"
Console output（GBK）："Huanzi"

2.2 Solution: Unified Coding

(1) Method 1: Make the console support UTF-8 (Windows)

#include &lt;&gt;

int main() {
    SetConsoleOutputCP(65001);  // Set the console output to UTF-8    // Subsequent file reading and printing logic...}

(2) Method 2: Use the correct file reading method

recommendfgetsInstead offscanf,becausefgetsSafer and correctly handle newlines.

FILE *file = fopen("", "r");
if (file == NULL) {
    printf("File opening failed\n");
    return 1;
}

char buffer[100] = {0};
if (fgets(buffer, sizeof(buffer), file) != NULL) {
    // Remove the line break at the end (if any)    buffer[strcspn(buffer, "\n")] = '\0';
    printf("The content read is: %s\n", buffer);
} else {
    printf("File is empty or read failed\n");
}
fclose(file);

(3) Method 3: Check file encoding

View file encoding with Notepad++ or VS Code.
If the file is UTF-8 with BOM, the first 3 bytes (BOM header) may need to be skipped:

// Skip the BOM (if it exists)if (fgetc(file) == 0xEF &amp;&amp; fgetc(file) == 0xBB &amp;&amp; fgetc(file) == 0xBF) {
    // BOM has been skipped} else {
    rewind(file);  // If it is not a UTF-8 BOM, go back to the beginning of the file}

3. Complete code example (cross-platform compatibility)

3.1 Complete code that supports UTF-8 under Windows

#include &lt;&gt;
#include &lt;&gt;
#include <> // only required by Windows
int main() {
    // Set the console output to UTF-8 (Windows only)    SetConsoleOutputCP(65001);

    char buffer[100] = {0};  // Initialize the buffer    FILE *file = fopen("", "r");
    if (file == NULL) {
        printf("File opening failed\n");
        return 1;
    }

    // Check and skip UTF-8 BOM (optional)    if (fgetc(file) == 0xEF &amp;&amp; fgetc(file) == 0xBB &amp;&amp; fgetc(file) == 0xBF) {
        printf("UTF-8 BOM detected, skipped\n");
    } else {
        rewind(file);  // If it is not a BOM, go back to the beginning of the file    }

    // Read file content    if (fgets(buffer, sizeof(buffer), file) != NULL) {
        buffer[strcspn(buffer, "\n")] = '\0';  // Remove line breaks        printf("The content read is: %s\n", buffer);
    } else {
        printf("File is empty or read failed\n");
    }

    fclose(file);
    return 0;
}

3.2 Compatible code under Linux/macOS

Linux terminals usually support UTF-8 by default, no additional settings are required:

#include &lt;&gt;
#include &lt;&gt;

int main() {
    char buffer[100] = {0};
    FILE *file = fopen("", "r");
    if (file == NULL) {
        printf("File opening failed\n");
        return 1;
    }

    if (fgets(buffer, sizeof(buffer), file) != NULL) {
        buffer[strcspn(buffer, "\n")] = '\0';
        printf("The content read is: %s\n", buffer);
    } else {
        printf("File is empty or read failed\n");
    }

    fclose(file);
    return 0;
}

4. FAQ

Q1: Why does it make an error when reading Chinese with fscanf?

fscanfIt is read in format. If the file encoding and terminal encoding are inconsistent, it may lead to truncation errors.fgetsSafer, suitable for reading whole lines of text.

Q2: How to ensure that the file is UTF-8 encoded?

Open the file with Notepad++ → Encoding → UTF-8 (no BOM).
Select Encoding in the lower right corner of VS Code.

Q3: What should I do if the file is GBK encoding?

If the console is GBK (Windows default), just read it directly. If it is Linux, you may need to convert:

#include <> // Additional library support is required// Or use a third-party library (such as libiconv) for encoding conversion

5. Summary

question	reason	Solution
"Hot" garbled code	Uninitialized`char`Array	`char buffer[100] = {0};`
Chinese display garbled code	File encoding (UTF-8) does not match terminal encoding (GBK)	`SetConsoleOutputCP(65001)`（Windows）
Read failed	File path error or permission issues	examine`fopen`Return value
Line breaking problem	`fgets`Will read`\n`	`buffer[strcspn(buffer, "\n")] = '\0';`

Through this article, you can completely solve the problem of reading Chinese garbled code in C language files.

The above is the detailed content of the problem of reading Chinese garbled code in C language. For more information about reading Chinese garbled code in C language, please pay attention to my other related articles!