In modern software development, the integration of cross-language systems has become part of daily work. Especially when interacting between Python and Java, encoding problems often become one of the main reasons for data transmission errors, garbled codes and difficulty in debugging.
Have you ever encountered this situation: Python scripts return the correct data through standard output, but Java service displays garbled code when reading? Or, on the contrary, the data printed in Java cannot be displayed correctly in Python?
The root cause of the problem is usually the inconsistency between Python and Java in character encoding processing, especially UTF-8 encoding. This blog will analyze in detail how to solve the coding inconsistency between Python and Java in a few simple steps, ensuring that data flows correctly and seamlessly.
Background: Why are there garbled codes?
Python and Java are different in how character encoding is processed. When a Python script generates output, it uses the system's encoding method by default, which may be UTF-8, GBK, etc., while Java usually expects to read the standard output stream in the UTF-8 way. If the encoding method of Python is inconsistent with the encoding when reading Java, it will lead to garbled code problems.
Problem scenario
Suppose we have a Python script that fetches data from some API and returns it. The Java service executes Python scripts through ProcessBuilder and reads the return result from the standard output stream. However, if the encoding is not specified explicitly, Java may cause garbled code due to the default use of platform encoding.
Solution: Ensure unified UTF-8 encoding
We can ensure coding consistency between Python and Java through several steps to avoid garbled code problems.
Step 1: Modify the Python script and explicitly specify the encoding
First, we need to make sure that the Python script is explicitly set to use UTF-8 encoding when outputting responses.
Modify Python scripts:
In Python scripts, we can explicitly set the encoding format of the response by setting = 'utf-8'. This step ensures that the output generated by the Python script is always encoded using UTF-8.
import sys import requests import json def get_access_token(): # Omit the logic of getting token return "your_access_token" def main(): url = "/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/ernie_speed?access_token=" + get_access_token() content = [1] # Get input from command line parameters payload = ({"messages": [{"role": "user", "content": content}]}) headers = {'Content-Type': 'application/json'} response = (url, headers=headers, data=payload) = 'utf-8' # explicitly set encoding print() # Output response content
With = 'utf-8', we explicitly tell Python to use UTF-8 encoding to handle responses so that even content containing special characters can be encoded correctly.
Step 2: Set Python's encoding environment variables in Java
When Java uses ProcessBuilder to execute Python scripts, the default encoding may not be UTF-8. In order to force Python output to be encoded using UTF-8, we need to set the environment variable PYTHONIOENCODING in ProcessBuilder.
Modify the Java service layer code:
In Java, when executing Python scripts using ProcessBuilder, we can ensure that the Python environment uses UTF-8 encoding through().put("PYTHONIOENCODING", "utf-8").
import .*; import ; public class PythonExecutorServiceImpl { private static final String PYTHON_EXECUTABLE = "python"; private static final String PYTHON_SCRIPT_PATH = "/path/to/your/"; public String executeScript(String content) throws IOException { // Create ProcessBuilder and execute Python scripts ProcessBuilder processBuilder = new ProcessBuilder( PYTHON_EXECUTABLE, PYTHON_SCRIPT_PATH, content ); // Set environment variables to ensure that Python output uses UTF-8 ().put("PYTHONIOENCODING", "utf-8"); (true); // Start the process and read the output stream Process process = (); InputStreamReader reader = new InputStreamReader((), StandardCharsets.UTF_8); BufferedReader bufferedReader = new BufferedReader(reader); StringBuilder output = new StringBuilder(); String line; while ((line = ()) != null) { (line).append("\n"); } (); return (); } }
By setting the environment variable PYTHONIOENCODING, we make sure that Python always uses UTF-8 encoding when executing, so that Java can correctly read Python's standard output stream.
Step 3: Ensure Java uses UTF-8 when reading streams
In Java, when we use InputStreamReader to read the output stream of a process, we also need to specify the encoding format explicitly. With new InputStreamReader((), StandardCharsets.UTF_8), we make sure Java reads Python's output in UTF-8 encoding.
Complete code example
Python script()
import sys import requests import json def get_access_token(): # Simulate to get token return "your_access_token" def main(): url = "/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/ernie_speed?access_token=" + get_access_token() content = [1] payload = ({"messages": [{"role": "user", "content": content}]}) headers = {'Content-Type': 'application/json'} response = (url, headers=headers, data=payload) = 'utf-8' # explicitly set encoding print() if __name__ == '__main__': main()
Java Service Layer ()
import .*; import ; public class PythonExecutorServiceImpl { private static final String PYTHON_EXECUTABLE = "python"; private static final String PYTHON_SCRIPT_PATH = "/path/to/your/"; public String executeScript(String content) throws IOException { ProcessBuilder processBuilder = new ProcessBuilder( PYTHON_EXECUTABLE, PYTHON_SCRIPT_PATH, content ); // Set environment variables to ensure that Python outputs UTF-8 ().put("PYTHONIOENCODING", "utf-8"); (true); Process process = (); InputStreamReader reader = new InputStreamReader((), StandardCharsets.UTF_8); BufferedReader bufferedReader = new BufferedReader(reader); StringBuilder output = new StringBuilder(); String line; while ((line = ()) != null) { (line).append("\n"); } (); return (); } }
Summarize
With these simple steps, we can ensure that Python scripts and Java services use the same UTF-8 encoding when data transfer, thereby avoiding garbled code problems. This method is not only suitable for the interaction between Python and Java, but also for data transmission problems between other languages. Maintaining unified character encoding is a small detail when integrating across languages, but it can effectively avoid many potential problems and make the system more stable and reliable.
During the development process, careful handling of character encoding issues is the key to avoiding trouble, especially when it comes to integration of different languages. I hope that through this blog, it can help you quickly solve the garbled problem in the interaction between Python and Java and improve the efficiency of cross-language development!
The above is the detailed content of solving the problem of garbled code when interacting with Python and Java. For more information on the solution to garbled code when interacting with Python and Java, please follow my other related articles!