I ran into an interesting problem today where I needed to convert a normal Unicode string to a Unicode encoded string as follows:
Converting \\\u9500\\u552e to \u9500\u552e is also known as selling.
At first glance it seems pretty simple, just use the re library to remove the leading backslash, but the following error is thrown during the replacement process:
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
(r"(\)\u", r'', t)
File "D:\Python36\lib\", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "D:\Python36\lib\", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "D:\Python36\lib\sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "D:\Python36\lib\sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "D:\Python36\lib\sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "D:\Python36\lib\sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "D:\Python36\lib\sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "D:\Python36\lib\sre_parse.py", line 502, in _parse
code = _escape(source, this, state)
File "D:\Python36\lib\sre_parse.py", line 362, in _escape
raise ("incomplete escape %s" % escape, len(escape))
sre_constants.error: incomplete escape \u at position 3
This probably means that the remaining \u after removing the leading backspace bar does not form a complete character.
By this point the problem seems a bit insurmountable, would we give up at this point?
Of course not, a quick google search reveals that there are actually people who have run into this problem, and the solution is pretty ingenious.
It is also possible to use the loads method of the json library ...
The solution is as follows:
import json s = '\\u9500\\u552e' print((f'"{s}"'))
PS: python3 convert string unicode to Chinese
Record a frequently encountered problem:
The resulting text is printed as "\uxxxx", in python3 using ('unicode_escape') will report an error: 'str' object has no attribute 'decode'.
The correct posture is:
('utf-8').decode("unicode_escape")
This is the whole content of this article.