SoFunction
Updated on 2024-11-18

python save large .mat data file report error out of IO limit operation

There is a limit to the size of .mat files that python can save, it seems to be within 5G, if you need to save dozens of gigabytes of data, you can use other methods.

For example, the h5 file

import h5py
def h5_data_write(train_data, train_label, test_data, test_label, shuffled_flag):
    print("The h5py file is being written to disk...")
    
    save_path = "../save_test/" + "train_test_split_data_label_" + shuffled_flag + ".h5"
    with (save_path, 'w') as f:
        f.create_dataset('train_data', data=train_data)
        f.create_dataset('train_label', data=train_label)
        f.create_dataset('test_data', data=test_data)
        f.create_dataset('test_label', data=test_label)
    print("h5py file saved successfully!")
def h5_data_read(filename):
    """
        keys() : Get the names of all files and folders in this folder.
        f['key_name'] : Get the corresponding object
    """
    file = (filename,'r')
    train_data = file['train_data'][:]
    train_label = file['train_label'][:]
    test_data = file['test_data'][:]
    test_label = file['test_label'][:]
    return train_data, train_label, test_data, test_label

Addendum: Reading MATLAB data files *.mat via python

contexts

In the process of doing deeplearning, the framework of caffe is used, and generally matlab is used to process the images (matlab is relatively simple and efficient in processing images), and python is used to generate the required lmdb files as well as to do tests to produce the results.

So some matlab from the image processing of the label information will be .mat file for python to read, but also python to produce the results of the information also need matlab to do further processing (of course, you can also use txt, do not mind the trouble of dealing with the structure of their own information).

present (sb for a job etc)

Data transfer between matlab and python is generally based on matlab's file format .mat. numpy and scipy in python provide some functions that can read, write, and process the data in .mat files very well.

Here numpy role is to provide Array function to map Matlab inside the Matrix, while scipy provides two functions loadmat and savemat to read and write .mat files.

Here is a simple test program

See the help file for specific function usage:

import  as sio 
import  as plt 
import numpy as np 
 
#matlab filename
matfn=u'E:/python/test program/162250671_162251656_1244.mat' 
data=(matfn) 
 
('all') 
xi=data['xi'] 
yi=data['yi'] 
ui=data['ui'] 
vi=data['vi'] 
(1) 
( xi[::5,::5],yi[::5,::5],ui[::5,::5],vi[::5,::5]) 
(2) 
(xi,yi,ui) 
()  
('', {'xi': xi,'yi': yi,'ui': ui,'vi': vi}) 

Example 2

import  as sio
import numpy as np
 
#### Here's an explanation of how python reads a .mat file and what to do with the results it gets #####
load_fn = ''
load_data = (load_fn)
load_matrix = load_data['matrix'] # Assume that the file contains the character variable matrix, e.g., save(load_fn, 'matrix') in matlab; of course, you can save more than one save(load_fn, 'matrix_x', 'matrix_y', ...). ;
load_matrix_row = load_matrix[0] # Took the first row of the matrix in matlab at the time, array row alignment in python
 
#### Here's an explanation of how python saves .mat files for use in a matlab program #####
save_fn = ''
save_array = ([1,2,3,4])
(save_fn, {'array': save_array}) # As above, the first line of the array variable is present
 
save_array_x = ([1,2,3,4])
save_array_y = ([5,6,7,8])
(save_fn, {'array_x': save_array_x, 'array_x': save_array_x}) #Tongli, a city in Jiangsu Province, China,

Given that the later goal is mainly to utilize existing Matlab data (.mat or .txt), the main consideration is python importing Matlab data. The following code solves the problem of reading .mat files in python.

Just use it primarily.

Provides two functions loadmat and savemat, very convenient.

# adapted from /rumswell/article/details/8545087
import  as sio  
#import  as plt
from pylab import *
import numpy as np   
 
matfn='E:\\Pythonrun\\myuse\\'   # the path of .mat data
data=(matfn)  
xx=data['matdata']
figure(1)
plot(xx)
show()

The following code is to read the txt data and converted into an array, the method is relatively stupid, more efficient method to be studied.

from numpy import * 
def file2list(filename):  

    fr = open(filename)  
    array = () # Form a list with one element for each row in the file
    num = len(array)  
    returnMat = zeros((num,3))# Initialize element 0, line number number of lists, where each element is still a list, element number 3, in this case representing the matrix
    index = 0   
 
    for line in array:  
        line = ()# Remove the carriage return sign after a line
        linelist = (' ')# Divide a line into a list of elements according to the separator.
        returnMat[index,:] = linelist[0:3]# Assign values to the matrix, note that this assignment is clumsy
        index +=1  
    return returnMat
 
fname = 'E:\\Pythonrun\\myuse\\num_data.txt'
data= file2list(fname)

Supplementary: Python Reads and Writes Matlab Mat Format Data

1. non-matlab v7.3 files read/write

import  as sio
import numpy
# matFile Read
matFile = ''
datas = (matFile)
# Load the data in matFile
# Assume that the variables stored within mat are matlabdata
matlabdata = datas['matlabdata']
# matFile Write
save_matFile = 'save_matlabdata.mat'
save_matlabdata = ([1,2,3,4,5])
(save_matFile, {'array':save_matlabdata})

2. matlab v7.3 files reading

If matlab saves data with '-v7.3', the function loads the data with an error:

File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/", line 64, in mat_reader_factory
    raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files

It can be adopted:

import h5py
with ('', 'r') as f:
    () # Variable names in
datas = ('')['matlabdata'].value

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more. If there is any mistake or something that has not been fully considered, please do not hesitate to give me advice.