Python's repeated splicing operations on immutable sequences can be inefficient because a new object is generated each time, and the interpreter needs to copy elements from the original object into the new one before appending new elements.
However, CPython is optimized for string manipulation because it is too common to do += operations on strings. Therefore, str is initialized with extra expandable space, so that incremental operations can be performed without the copy-and-append step.
Examine the process through bytecode.
>>> s_code = 'a += "b"' >>> c = compile(s_code, '', 'exec') >>> c.co_code b'e\x00\x00d\x00\x007Z\x00\x00d\x01\x00S' >>> c.co_names ('a',) >>> c.co_consts ('b', None)
The byte code you get is of type Bytes. Here is some knowledge of the Bytes type interspersed.
Bytes type
b'e\x00\x00d\x00\x007Z\x00\x00d\x01\x00S', b indicates that it is the type of Bytes. Bytes record data in the form of a sequence of binary bytes, with each character representing one byte (8 bits). For example, e above represents binary 0110 0101. A partial ASCII code comparison table is shown below.
However, not all bytes are displayable, and even some bytes cannot be corresponded to ASCII (because ASCII defines only 128 characters, while a byte has 256). For example, 0000 0000 corresponds to ASCII is not displayable, 0111 1111 has no corresponding ASCII code.
In order to represent these undisplayable bytes, the \x symbol was introduced, which indicates that the subsequent character is in hexadecimal. For example, \x00 indicates 00 in hexadecimal, or 0000 0000 in binary.
At this point, all bytes can be represented.
bytecode analysis
Go back to the beginning of the code. For display purposes, convert b'e\x00\x00d\x00\x007Z\x00\x00d\x01\x00S' to hexadecimal to display it.
>>> c.co_code.hex() '650000640000375a000064010053'
The function can be used to get the operation instruction corresponding to the opcode
>>> import opcode >>> [0x65] 'LOAD_NAME'
Thus, the complete bytecode can be interpreted as (TOS i.e. top-of-stack, top-of-stack element):
nibbles:placement,functionality 65:0,LOAD_NAME 0000:parameters,commander-in-chief (military)co_names[0]value of,assume (office)avalue of,push on a stack 64:3,LOAD_CONST 0000:parameters,commander-in-chief (military)co_consts[0],assume (office)'b',push on a stack 37:6,INPLACE_ADD,TOS = TOS1 + TOS 5a:7,STORE_NAME 0000:parameters,co_names[0]=TOS,assume (office)commander-in-chief (military)栈顶赋值给a 64:10,LOAD_CONST 0100:parameters 53:13,RETURN_VALUE,Returns with TOS to the caller of the function
It is actually possible to obtain readable bytecode directly with the help of the dis function:
>>> import dis >>> (s_code) 1 0 LOAD_NAME 0 (a) 3 LOAD_CONST 0 ('b') 6 INPLACE_ADD 7 STORE_NAME 0 (a) 10 LOAD_CONST 1 (None) 13 RETURN_VALUE
Full Code:
s_code = 'a += "b"' c = compile(s_code, '', 'exec') c.co_code c.co_names c.co_consts c.co_code.hex() import dis (s_code)
Very fail, comparing the assignment bytecode of string and tuple doesn't show the optimization of string...
These are the relevant knowledge points about python bytecode in this time, thank you for your support.