Bytecode in Python is represented in the form b'xxx'. x can be represented as a character or in the ASCII encoding form \xnn, nn from 00-ff (hexadecimal) for a total of 256 characters.
I. Basic operations
Here is a list of the basic operations of bytes, you can see that it is still very similar to strings:
In[40]: b = b"abcd\x64" In[41]: b Out[41]: b'abcdd' In[42]: type(b) Out[42]: bytes In[43]: len(b) Out[43]: 5 In[44]: b[4] Out[44]: 100 # 100In hexadecimal this is\x64
If you want to modify a byte in a byte string, you can't modify it directly, you need to convert it to a bytearray and then modify it:
In[46]: barr = bytearray(b) In[47]: type(barr) Out[47]: bytearray In[48]: barr[0] = 110 In[49]: barr Out[49]: bytearray(b'nbcdd')
II. Relationship between bytes and characters
As mentioned above, bytes and characters are very similar, in fact, they can be converted to each other. A byte can be converted to a character by encoding it in a certain way. Byte through the encode () method into the encoding method can be converted to characters, and characters through the decode () method can be converted to bytes:
In[50]: s = "Life is short, I use Python." In[51]: b = ('utf-8') In[52]: b Out[52]: b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\xbc\x8c\xe6\x88\x91\xe7\x94\xa8Python' In[53]: c = ('gb18030') In[54]: c Out[54]: b'\xc8\xcb\xc9\xfa\xbf\xe0\xb6\xcc\xa3\xac\xce\xd2\xd3\xc3Python' In[55]: ('utf-8') Out[55]: 'Life is short, I use Python' In[56]: ('gb18030') Out[56]: 'Life is short, I use Python' In[57]: ('utf-8') Traceback (most recent call last): exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-57-8b50aa70bce9>", line 1, in <module> ('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte In[58]: ('gb18030') Out[58]: 'Yeeseng Yun Yun Transportation Velvet Velvet Jacket'
We can see that with different encoding methods to parse out the characters and bytes in a completely different way, if the encoding and decoding with different encoding methods, it will produce garbled code, or even conversion failure. Because each encoding contains a different number of byte types, such as the above example of \xc8 exceeds the maximum characters of utf-8.
III. Applications
Take a simple example, I want to crawl the content of a web page, now to crawl with Baidu search Python return page, Baidu uses utf-8 encoding format, if you do not decode the return results, it is a super long byte string. If you don't decode the result, it will be a super long byte string. If you decode it correctly, it will display a normal html page.
import url = "/s?ie=utf-8&wd=python" page = (url) mybytes = () encoding = "utf-8" print((encoding)) ()
This is the whole content of this article, I hope to help you learn python programming.