There is a limit to the size of .mat files that python can save, it seems to be within 5G, if you need to save dozens of gigabytes of data, you can use other methods.
For example, the h5 file
import h5py def h5_data_write(train_data, train_label, test_data, test_label, shuffled_flag): print("The h5py file is being written to disk...") save_path = "../save_test/" + "train_test_split_data_label_" + shuffled_flag + ".h5" with (save_path, 'w') as f: f.create_dataset('train_data', data=train_data) f.create_dataset('train_label', data=train_label) f.create_dataset('test_data', data=test_data) f.create_dataset('test_label', data=test_label) print("h5py file saved successfully!") def h5_data_read(filename): """ keys() : Get the names of all files and folders in this folder. f['key_name'] : Get the corresponding object """ file = (filename,'r') train_data = file['train_data'][:] train_label = file['train_label'][:] test_data = file['test_data'][:] test_label = file['test_label'][:] return train_data, train_label, test_data, test_label
Addendum: Reading MATLAB data files *.mat via python
contexts
In the process of doing deeplearning, the framework of caffe is used, and generally matlab is used to process the images (matlab is relatively simple and efficient in processing images), and python is used to generate the required lmdb files as well as to do tests to produce the results.
So some matlab from the image processing of the label information will be .mat file for python to read, but also python to produce the results of the information also need matlab to do further processing (of course, you can also use txt, do not mind the trouble of dealing with the structure of their own information).
present (sb for a job etc)
Data transfer between matlab and python is generally based on matlab's file format .mat. numpy and scipy in python provide some functions that can read, write, and process the data in .mat files very well.
Here numpy role is to provide Array function to map Matlab inside the Matrix, while scipy provides two functions loadmat and savemat to read and write .mat files.
Here is a simple test program
See the help file for specific function usage:
import as sio import as plt import numpy as np #matlab filename matfn=u'E:/python/test program/162250671_162251656_1244.mat' data=(matfn) ('all') xi=data['xi'] yi=data['yi'] ui=data['ui'] vi=data['vi'] (1) ( xi[::5,::5],yi[::5,::5],ui[::5,::5],vi[::5,::5]) (2) (xi,yi,ui) () ('', {'xi': xi,'yi': yi,'ui': ui,'vi': vi})
Example 2
import as sio import numpy as np #### Here's an explanation of how python reads a .mat file and what to do with the results it gets ##### load_fn = '' load_data = (load_fn) load_matrix = load_data['matrix'] # Assume that the file contains the character variable matrix, e.g., save(load_fn, 'matrix') in matlab; of course, you can save more than one save(load_fn, 'matrix_x', 'matrix_y', ...). ; load_matrix_row = load_matrix[0] # Took the first row of the matrix in matlab at the time, array row alignment in python #### Here's an explanation of how python saves .mat files for use in a matlab program ##### save_fn = '' save_array = ([1,2,3,4]) (save_fn, {'array': save_array}) # As above, the first line of the array variable is present save_array_x = ([1,2,3,4]) save_array_y = ([5,6,7,8]) (save_fn, {'array_x': save_array_x, 'array_x': save_array_x}) #Tongli, a city in Jiangsu Province, China,
Given that the later goal is mainly to utilize existing Matlab data (.mat or .txt), the main consideration is python importing Matlab data. The following code solves the problem of reading .mat files in python.
Just use it primarily.
Provides two functions loadmat and savemat, very convenient.
# adapted from /rumswell/article/details/8545087 import as sio #import as plt from pylab import * import numpy as np matfn='E:\\Pythonrun\\myuse\\' # the path of .mat data data=(matfn) xx=data['matdata'] figure(1) plot(xx) show()
The following code is to read the txt data and converted into an array, the method is relatively stupid, more efficient method to be studied.
from numpy import * def file2list(filename): fr = open(filename) array = () # Form a list with one element for each row in the file num = len(array) returnMat = zeros((num,3))# Initialize element 0, line number number of lists, where each element is still a list, element number 3, in this case representing the matrix index = 0 for line in array: line = ()# Remove the carriage return sign after a line linelist = (' ')# Divide a line into a list of elements according to the separator. returnMat[index,:] = linelist[0:3]# Assign values to the matrix, note that this assignment is clumsy index +=1 return returnMat fname = 'E:\\Pythonrun\\myuse\\num_data.txt' data= file2list(fname)
Supplementary: Python Reads and Writes Matlab Mat Format Data
1. non-matlab v7.3 files read/write
import as sio import numpy # matFile Read matFile = '' datas = (matFile) # Load the data in matFile # Assume that the variables stored within mat are matlabdata matlabdata = datas['matlabdata'] # matFile Write save_matFile = 'save_matlabdata.mat' save_matlabdata = ([1,2,3,4,5]) (save_matFile, {'array':save_matlabdata})
2. matlab v7.3 files reading
If matlab saves data with '-v7.3', the function loads the data with an error:
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/", line 64, in mat_reader_factory
raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files
It can be adopted:
import h5py with ('', 'r') as f: () # Variable names in datas = ('')['matlabdata'].value
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more. If there is any mistake or something that has not been fully considered, please do not hesitate to give me advice.