introduction
In C programming, file operation is one of the common tasks. However, when reading text files containing Chinese, developers often encounter problems such as "hot" garbled code or Chinese display exceptions. These problems usually stem from buffer not initialized, file encoding mismatch, terminal display encoding inconsistent.
This article will analyze the root causes of these problems in depth and provide complete solutions including code examples, coding tuning methods, and cross-platform compatibility suggestions.
1. Problem phenomenon: Why does "calm" garbled code appear?
1.1 The source of "Hot"
In Visual Studio's Debug mode, uninitialized stack memory will be filled with 0xCC. When these bytes are interpreted as GBK encoding, 0xCCCC corresponds to the Chinese character "Song", so the uninitialized char array will be displayed as "Song...".
Sample code (problem reappears):
#include <> int main() { char buffer[100]; // Not initialized printf("%s\n", buffer); // It may output "Hot..." return 0; }
Cause analysis:
-
buffer
Uninitialized, the memory content is random (filled in Debug mode0xCC
)。 - when
printf
Try to use string (%s
) When outputting, it will be read all the time to\0
End, and0xCC
Decoded by GBK as "hot".
1.2 Solution: Initialize the buffer
char buffer[100] = {0}; // Initialize to all 0// Or use memsetmemset(buffer, 0, sizeof(buffer));
This way, the buffer will be cleared to avoid outputting undefined content.
2. Analysis and solution of Chinese garbled problems
Even if the "hot" problem is solved, garbled code may still appear when reading Chinese, the main reasons include:
2.1 File encoding does not match terminal encoding
- UTF-8: It is recommended for modern operating systems, with one Chinese character accounting for 3 bytes.
- GBK: Windows default encoding, one Chinese character accounts for 2 bytes.
If the file is UTF-8, but the console uses GBK by default, it will cause garbled code.
Example (garbled after UTF-8 file reading):
File content(UTF-8):"Hello" Console output(GBK):"Huanzi"
2.2 Solution: Unified Coding
(1) Method 1: Make the console support UTF-8 (Windows)
#include <> int main() { SetConsoleOutputCP(65001); // Set the console output to UTF-8 // Subsequent file reading and printing logic...}
(2) Method 2: Use the correct file reading method
recommendfgets
Instead offscanf
,becausefgets
Safer and correctly handle newlines.
FILE *file = fopen("", "r"); if (file == NULL) { printf("File opening failed\n"); return 1; } char buffer[100] = {0}; if (fgets(buffer, sizeof(buffer), file) != NULL) { // Remove the line break at the end (if any) buffer[strcspn(buffer, "\n")] = '\0'; printf("The content read is: %s\n", buffer); } else { printf("File is empty or read failed\n"); } fclose(file);
(3) Method 3: Check file encoding
- View file encoding with Notepad++ or VS Code.
- If the file is UTF-8 with BOM, the first 3 bytes (BOM header) may need to be skipped:
// Skip the BOM (if it exists)if (fgetc(file) == 0xEF && fgetc(file) == 0xBB && fgetc(file) == 0xBF) { // BOM has been skipped} else { rewind(file); // If it is not a UTF-8 BOM, go back to the beginning of the file}
3. Complete code example (cross-platform compatibility)
3.1 Complete code that supports UTF-8 under Windows
#include <> #include <> #include <> // only required by Windows int main() { // Set the console output to UTF-8 (Windows only) SetConsoleOutputCP(65001); char buffer[100] = {0}; // Initialize the buffer FILE *file = fopen("", "r"); if (file == NULL) { printf("File opening failed\n"); return 1; } // Check and skip UTF-8 BOM (optional) if (fgetc(file) == 0xEF && fgetc(file) == 0xBB && fgetc(file) == 0xBF) { printf("UTF-8 BOM detected, skipped\n"); } else { rewind(file); // If it is not a BOM, go back to the beginning of the file } // Read file content if (fgets(buffer, sizeof(buffer), file) != NULL) { buffer[strcspn(buffer, "\n")] = '\0'; // Remove line breaks printf("The content read is: %s\n", buffer); } else { printf("File is empty or read failed\n"); } fclose(file); return 0; }
3.2 Compatible code under Linux/macOS
Linux terminals usually support UTF-8 by default, no additional settings are required:
#include <> #include <> int main() { char buffer[100] = {0}; FILE *file = fopen("", "r"); if (file == NULL) { printf("File opening failed\n"); return 1; } if (fgets(buffer, sizeof(buffer), file) != NULL) { buffer[strcspn(buffer, "\n")] = '\0'; printf("The content read is: %s\n", buffer); } else { printf("File is empty or read failed\n"); } fclose(file); return 0; }
4. FAQ
Q1: Why does it make an error when reading Chinese with fscanf?
fscanf
It is read in format. If the file encoding and terminal encoding are inconsistent, it may lead to truncation errors.fgets
Safer, suitable for reading whole lines of text.
Q2: How to ensure that the file is UTF-8 encoded?
- Open the file with Notepad++ → Encoding → UTF-8 (no BOM).
- Select Encoding in the lower right corner of VS Code.
Q3: What should I do if the file is GBK encoding?
If the console is GBK (Windows default), just read it directly. If it is Linux, you may need to convert:
#include <> // Additional library support is required// Or use a third-party library (such as libiconv) for encoding conversion
5. Summary
question | reason | Solution |
---|---|---|
"Hot" garbled code | Uninitializedchar Array |
char buffer[100] = {0}; |
Chinese display garbled code | File encoding (UTF-8) does not match terminal encoding (GBK) |
SetConsoleOutputCP(65001) (Windows) |
Read failed | File path error or permission issues | examinefopen Return value |
Line breaking problem |
fgets Will read\n
|
buffer[strcspn(buffer, "\n")] = '\0'; |
Through this article, you can completely solve the problem of reading Chinese garbled code in C language files.
The above is the detailed content of the problem of reading Chinese garbled code in C language. For more information about reading Chinese garbled code in C language, please pay attention to my other related articles!