SoFunction
Updated on 2024-11-10

Python must learn bytes bytes every day!

Bytecode in Python is represented in the form b'xxx'. x can be represented as a character or in the ASCII encoding form \xnn, nn from 00-ff (hexadecimal) for a total of 256 characters.

I. Basic operations

Here is a list of the basic operations of bytes, you can see that it is still very similar to strings:

In[40]: b = b"abcd\x64"
In[41]: b
Out[41]: b'abcdd'
In[42]: type(b)
Out[42]: bytes
In[43]: len(b)
Out[43]: 5
In[44]: b[4]
Out[44]: 100 # 100In hexadecimal this is\x64

If you want to modify a byte in a byte string, you can't modify it directly, you need to convert it to a bytearray and then modify it:

In[46]: barr = bytearray(b)
In[47]: type(barr)
Out[47]: bytearray
In[48]: barr[0] = 110
In[49]: barr
Out[49]: bytearray(b'nbcdd')

II. Relationship between bytes and characters

As mentioned above, bytes and characters are very similar, in fact, they can be converted to each other. A byte can be converted to a character by encoding it in a certain way. Byte through the encode () method into the encoding method can be converted to characters, and characters through the decode () method can be converted to bytes:

In[50]: s = "Life is short, I use Python."
In[51]: b = ('utf-8')
In[52]: b
Out[52]: b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\xbc\x8c\xe6\x88\x91\xe7\x94\xa8Python'
In[53]: c = ('gb18030')
In[54]: c
Out[54]: b'\xc8\xcb\xc9\xfa\xbf\xe0\xb6\xcc\xa3\xac\xce\xd2\xd3\xc3Python'
In[55]: ('utf-8')
Out[55]: 'Life is short, I use Python'
In[56]: ('gb18030')
Out[56]: 'Life is short, I use Python'
In[57]: ('utf-8')
Traceback (most recent call last):
 exec(code_obj, self.user_global_ns, self.user_ns)
 File "<ipython-input-57-8b50aa70bce9>", line 1, in <module>
 ('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte
In[58]: ('gb18030')
Out[58]: 'Yeeseng Yun Yun Transportation Velvet Velvet Jacket'

We can see that with different encoding methods to parse out the characters and bytes in a completely different way, if the encoding and decoding with different encoding methods, it will produce garbled code, or even conversion failure. Because each encoding contains a different number of byte types, such as the above example of \xc8 exceeds the maximum characters of utf-8.

III. Applications

Take a simple example, I want to crawl the content of a web page, now to crawl with Baidu search Python return page, Baidu uses utf-8 encoding format, if you do not decode the return results, it is a super long byte string. If you don't decode the result, it will be a super long byte string. If you decode it correctly, it will display a normal html page.

import 

url = "/s?ie=utf-8&wd=python"
page = (url)
mybytes = ()
encoding = "utf-8"
print((encoding))
()

This is the whole content of this article, I hope to help you learn python programming.