SoFunction
Updated on 2024-11-18

Convert \\\uxxxx to Unicode string in python

I ran into an interesting problem today where I needed to convert a normal Unicode string to a Unicode encoded string as follows:

Converting \\\u9500\\u552e to \u9500\u552e is also known as selling.

At first glance it seems pretty simple, just use the re library to remove the leading backslash, but the following error is thrown during the replacement process:

Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    (r"(\)\u", r'', t)
  File "D:\Python36\lib\", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "D:\Python36\lib\", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "D:\Python36\lib\sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "D:\Python36\lib\sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "D:\Python36\lib\sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "D:\Python36\lib\sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "D:\Python36\lib\sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "D:\Python36\lib\sre_parse.py", line 502, in _parse
    code = _escape(source, this, state)
  File "D:\Python36\lib\sre_parse.py", line 362, in _escape
    raise ("incomplete escape %s" % escape, len(escape))
sre_constants.error: incomplete escape \u at position 3

This probably means that the remaining \u after removing the leading backspace bar does not form a complete character.

By this point the problem seems a bit insurmountable, would we give up at this point?

Of course not, a quick google search reveals that there are actually people who have run into this problem, and the solution is pretty ingenious.

It is also possible to use the loads method of the json library ...

The solution is as follows:

import json
s = '\\u9500\\u552e'
print((f'"{s}"'))

PS: python3 convert string unicode to Chinese

Record a frequently encountered problem:

The resulting text is printed as "\uxxxx", in python3 using ('unicode_escape') will report an error: 'str' object has no attribute 'decode'.

The correct posture is:

('utf-8').decode("unicode_escape")

This is the whole content of this article.