SoFunction
Updated on 2024-11-10

How Python Reads and Writes Binary Array Data

concern

You want to read and write structured data from a binary array into a Python tuple.

prescription

It is possible to usestruct module to process binary data. Here's a sample piece of code that writes a list of Python tuples to a binary file and uses thestruct Encodes each tuple as a structure.

from struct import Struct
def write_records(records, format, f):
  '''
  Write a sequence of tuples to a binary file of structures.
  '''
  record_struct = Struct(format)
  for r in records:
    (record_struct.pack(*r))

# Example
if __name__ == '__main__':
  records = [ (1, 2.3, 4.5),
        (6, 7.8, 9.0),
        (12, 13.4, 56.7) ]
  with open('', 'wb') as f:
    write_records(records, '<idd', f)

There are many ways to read this file and return a list of tuples. First, if you intend to read the file incrementally in chunks, you can do so:

from struct import Struct

def read_records(format, f):
  record_struct = Struct(format)
  chunks = iter(lambda: (record_struct.size), b'')
  return (record_struct.unpack(chunk) for chunk in chunks)

# Example
if __name__ == '__main__':
  with open('','rb') as f:
    for rec in read_records('<idd', f):
      # Process rec
      ...

If you want to read the whole file at once into a byte string and then parse it in pieces. Then you can do it like this:

from struct import Struct

def unpack_records(format, data):
  record_struct = Struct(format)
  return (record_struct.unpack_from(data, offset)
      for offset in range(0, len(data), record_struct.size))

# Example
if __name__ == '__main__':
  with open('', 'rb') as f:
    data = ()
  for rec in unpack_records('<idd', data):
    # Process rec
    ...

The result in both cases is an iterable object that returns the original tuple used to create the file.

talk over

For programs that need to encode and decode binary data, it is common to use thestruct Modules. In order to declare a new structure, simply create aStruct instances will suffice:

# Little endian 32-bit integer, two double precision floats
record_struct = Struct('<idd')

Structures usually use some structural code values i, d, f, etc. [cf.Python Documentation ] Each of these codes represents a particular binary data type such as 32-bit integer, 64-bit floating point number, 32-bit floating point number etc. The first character < specifies the byte order. In this example, it means "low first". Change this character to > for high first, or ! for network byte order.

yieldedStruct Instances have a number of properties and methods for manipulating structures of the appropriate type.size attribute contains the number of bytes in the structure, which is useful during I/O operations.pack() cap (a poem)unpack() methods are used to pack and unpack data. For example:

>>> from struct import Struct
>>> record_struct = Struct('<idd')
>>> record_struct.size
20
>>> record_struct.pack(1, 2.0, 3.0)
b'\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x08@'
>>> record_struct.unpack(_)
(1, 2.0, 3.0)
>>>

And sometimes you'll seepack() cap (a poem)unpack() The operation is called as a module-level function, similar to the following:

>>> import struct
>>> ('<idd', 1, 2.0, 3.0)
b'\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x08@'
>>> ('<idd', _)
(1, 2.0, 3.0)
>>>

This can work, but it doesn't feel as elegant as the instance method, especially if the same structure appears in multiple places in your code. By creating aStruct instance, the formatting code is specified only once and all operations are handled centrally. This makes code maintenance much simpler (since you only need to change the code in one place).

The code that reads the binary structure is going to use some very interesting and beautiful programming techniques. In the functionread_records Middle.iter() is used to create an iterator that returns a fixed-size block of data. This iterator will continually call a user-supplied callable object (such as thelambda: (record_struct.size) ), until it returns a special value (such as b''), at which point the iteration stops. Example:

>>> f = open('', 'rb')
>>> chunks = iter(lambda: (20), b'')
>>> chunks
<callable_iterator object at 0x10069e6d0>
>>> for chk in chunks:
... print(chk)
...
b'\x01\x00\x00\x00ffffff\x02@\x00\x00\x00\x00\x00\x00\x12@'
b'\x06\x00\x00\x00333333\x1f@\x00\x00\x00\x00\x00\x00"@'
b'\x0c\x00\x00\x00\xcd\xcc\xcc\xcc\xcc\xcc*@\x9a\x99\x99\x99\x99YL@'
>>>

As you can see, one of the reasons for creating an iterable object is that it allows records to be created using a generator derivation. If you don't use this technique, then the code might look like the following:

def read_records(format, f):
  record_struct = Struct(format)
  while True:
    chk = (record_struct.size)
    if chk == b'':
      break
    yield record_struct.unpack(chk)

in a function unpack_records() Another method is used inunpack_from() The following is an example of an unpack_from() function. unpack_from() is useful for extracting binary data from a large binary array, because it doesn't create any temporary objects or perform memory copy operations. You just give it a byte string (or array) and a byte offset, and it will unpack the data directly from that location.

If you use theunpack() in place ofunpack_from() , you need to modify the code to construct a large number of small slices as well as to perform offset calculations. For example:

def unpack_records(format, data):
  record_struct = Struct(format)
  return (record_struct.unpack(data[offset:offset + record_struct.size])
      for offset in range(0, len(data), record_struct.size))

In addition to the fact that the code looks complex, this scheme has to do a lot of extra work because it performs a lot of offset calculations, copying data and constructing small sliced objects. If you are going to unwrap a large number of structures from a large byte string read inunpack_from() It will perform much better.

At the time of unpacking, thecollections The named tuple object in the module may be something you want to use. It allows you to set property names to return tuples. For example:

from collections import namedtuple

Record = namedtuple('Record', ['kind','x','y'])

with open('', 'rb') as f:
  records = (Record(*r) for r in read_records('<idd', f))

for r in records:
  print(, , )

If your program needs to handle large amounts of binary data, you're better off using thenumpy modules. For example, instead of a list of tuples, you can read a binary data into a structured array. Like the following:

>>> import numpy as np
>>> f = open('', 'rb')
>>> records = (f, dtype='<i,<d,<d')
>>> records
array([(1, 2.3, 4.5), (6, 7.8, 9.0), (12, 13.4, 56.7)],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8')])
>>> records[0]
(1, 2.3, 4.5)
>>> records[1]
(6, 7.8, 9.0)
>>>

Finally, if you need to read binary data from a known file format (e.g., image format, graphic file, HDF5, etc.), check to see if Python already provides an existing module. There's no point in reinventing the wheel until you have to.

Above is how to read and write binary array data in Python details, more information about Python read and write binary array data please pay attention to my other related articles!