SoFunction
Updated on 2025-05-08

Solve the problem of garbled code interaction between Python and Java

In modern software development, the integration of cross-language systems has become part of daily work. Especially when interacting between Python and Java, encoding problems often become one of the main reasons for data transmission errors, garbled codes and difficulty in debugging.

Have you ever encountered this situation: Python scripts return the correct data through standard output, but Java service displays garbled code when reading? Or, on the contrary, the data printed in Java cannot be displayed correctly in Python?

The root cause of the problem is usually the inconsistency between Python and Java in character encoding processing, especially UTF-8 encoding. This blog will analyze in detail how to solve the coding inconsistency between Python and Java in a few simple steps, ensuring that data flows correctly and seamlessly.

Background: Why are there garbled codes?

Python and Java are different in how character encoding is processed. When a Python script generates output, it uses the system's encoding method by default, which may be UTF-8, GBK, etc., while Java usually expects to read the standard output stream in the UTF-8 way. If the encoding method of Python is inconsistent with the encoding when reading Java, it will lead to garbled code problems.

Problem scenario

Suppose we have a Python script that fetches data from some API and returns it. The Java service executes Python scripts through ProcessBuilder and reads the return result from the standard output stream. However, if the encoding is not specified explicitly, Java may cause garbled code due to the default use of platform encoding.

Solution: Ensure unified UTF-8 encoding

We can ensure coding consistency between Python and Java through several steps to avoid garbled code problems.

Step 1: Modify the Python script and explicitly specify the encoding

First, we need to make sure that the Python script is explicitly set to use UTF-8 encoding when outputting responses.

Modify Python scripts:

In Python scripts, we can explicitly set the encoding format of the response by setting = 'utf-8'. This step ensures that the output generated by the Python script is always encoded using UTF-8.

import sys
import requests
import json
 
def get_access_token():
    # Omit the logic of getting token    return "your_access_token"
 
def main():
    url = "/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/ernie_speed?access_token=" + get_access_token()
    content = [1]  # Get input from command line parameters 
    payload = ({"messages": [{"role": "user", "content": content}]})
    headers = {'Content-Type': 'application/json'}
 
    response = (url, headers=headers, data=payload)
     = 'utf-8'  # explicitly set encoding    print()  # Output response content

With = 'utf-8', we explicitly tell Python to use UTF-8 encoding to handle responses so that even content containing special characters can be encoded correctly.

Step 2: Set Python's encoding environment variables in Java

When Java uses ProcessBuilder to execute Python scripts, the default encoding may not be UTF-8. In order to force Python output to be encoded using UTF-8, we need to set the environment variable PYTHONIOENCODING in ProcessBuilder.

Modify the Java service layer code:

In Java, when executing Python scripts using ProcessBuilder, we can ensure that the Python environment uses UTF-8 encoding through().put("PYTHONIOENCODING", "utf-8").

import .*;
import ;
 
public class PythonExecutorServiceImpl {
    private static final String PYTHON_EXECUTABLE = "python";
    private static final String PYTHON_SCRIPT_PATH = "/path/to/your/";
 
    public String executeScript(String content) throws IOException {
        // Create ProcessBuilder and execute Python scripts        ProcessBuilder processBuilder = new ProcessBuilder(
                PYTHON_EXECUTABLE,
                PYTHON_SCRIPT_PATH,
                content
        );
 
        // Set environment variables to ensure that Python output uses UTF-8        ().put("PYTHONIOENCODING", "utf-8");
        (true);
 
        // Start the process and read the output stream        Process process = ();
        InputStreamReader reader = new InputStreamReader((), StandardCharsets.UTF_8);
        BufferedReader bufferedReader = new BufferedReader(reader);
 
        StringBuilder output = new StringBuilder();
        String line;
        while ((line = ()) != null) {
            (line).append("\n");
        }
 
        ();
        return ();
    }
}

By setting the environment variable PYTHONIOENCODING, we make sure that Python always uses UTF-8 encoding when executing, so that Java can correctly read Python's standard output stream.

Step 3: Ensure Java uses UTF-8 when reading streams

In Java, when we use InputStreamReader to read the output stream of a process, we also need to specify the encoding format explicitly. With new InputStreamReader((), StandardCharsets.UTF_8), we make sure Java reads Python's output in UTF-8 encoding.

Complete code example

Python script()

import sys
import requests
import json
 
def get_access_token():
    # Simulate to get token    return "your_access_token"
 
def main():
    url = "/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/ernie_speed?access_token=" + get_access_token()
    content = [1]
 
    payload = ({"messages": [{"role": "user", "content": content}]})
    headers = {'Content-Type': 'application/json'}
 
    response = (url, headers=headers, data=payload)
     = 'utf-8'  # explicitly set encoding    print()
 
if __name__ == '__main__':
    main()

Java Service Layer ()

import .*;
import ;
 
public class PythonExecutorServiceImpl {
    private static final String PYTHON_EXECUTABLE = "python";
    private static final String PYTHON_SCRIPT_PATH = "/path/to/your/";
 
    public String executeScript(String content) throws IOException {
        ProcessBuilder processBuilder = new ProcessBuilder(
                PYTHON_EXECUTABLE,
                PYTHON_SCRIPT_PATH,
                content
        );
 
        // Set environment variables to ensure that Python outputs UTF-8        ().put("PYTHONIOENCODING", "utf-8");
        (true);
 
        Process process = ();
        InputStreamReader reader = new InputStreamReader((), StandardCharsets.UTF_8);
        BufferedReader bufferedReader = new BufferedReader(reader);
 
        StringBuilder output = new StringBuilder();
        String line;
        while ((line = ()) != null) {
            (line).append("\n");
        }
 
        ();
        return ();
    }
}

Summarize

With these simple steps, we can ensure that Python scripts and Java services use the same UTF-8 encoding when data transfer, thereby avoiding garbled code problems. This method is not only suitable for the interaction between Python and Java, but also for data transmission problems between other languages. Maintaining unified character encoding is a small detail when integrating across languages, but it can effectively avoid many potential problems and make the system more stable and reliable.

During the development process, careful handling of character encoding issues is the key to avoiding trouble, especially when it comes to integration of different languages. I hope that through this blog, it can help you quickly solve the garbled problem in the interaction between Python and Java and improve the efficiency of cross-language development!

The above is the detailed content of solving the problem of garbled code when interacting with Python and Java. For more information on the solution to garbled code when interacting with Python and Java, please follow my other related articles!