Python Sparse Matrix-Sparse Storage and Transformation

Sparse matrix-sparsep

from scipy import sparse

Storage forms for sparse matrices

Solving linear models in science and engineering often results in many large matrices that have most of the elements zero, known as sparse matrices. Saving such matrices with NumPy's ndarray array would be a waste of memory, and due to the sparse nature of the matrices, memory usage can be conserved by saving information about only the non-zero elements. In addition, writing arithmetic functions for this special structure of matrices can also improve the speed of matrix arithmetic.

The library provides several formats for representing sparse matrices, each of which is useful for different purposes, with dok_matrix and lil_matrix being suitable for gradually adding elements.

dok_matrix is inherited from dict, which uses a dictionary to save the elements in the matrix that are not 0: the key of the dictionary is a tuple that holds information about the elements (rows, columns), and its corresponding value is the value of the element in the matrix that is located in the (rows, columns). Obviously sparse matrices in dictionary format are well suited for adding, deleting and accessing operations on individual elements. It is often used to incrementally add non-zero elements and then converted to other formats that support fast operations.

a = sparse.dok_matrix((10, 5))
a[2:5, 3] = 1.0, 2.0, 3.0
print ()
print ()

[(2, 3), (3, 3), (4, 3)]
[1.0, 2.0, 3.0]

lil_matrix uses two lists to hold non-zero elements. data holds the non-zero elements in each row, and rows holds the columns where the non-zero elements are located. This format is also good for adding elements one by one and getting row related data quickly.

b = sparse.lil_matrix((10, 5))
b[2, 3] = 1.0
b[3, 4] = 2.0
b[3, 2] = 3.0
print 
print

[[] [] [1.0] [3.0, 2.0] [] [] [] [] [] []]
[[] [] [3] [2, 4] [] [] [] [] [] []]

coo_matrix uses three arrays row, col and data to hold information about non-zero elements. These three arrays are of the same length, row holds the rows of the elements, col holds the columns of the elements, and data holds the values of the elements. coo_matrix doesn't support accessing, adding or deleting elements, and once created, it is almost impossible to do any operations or matrix operations on it, except for converting it to a matrix of another format.

coo_matrixRepeated elements are supported, i.e. the same row and column coordinates can appear more than once, and when converted to a matrix of another format, multiple values corresponding to the same row and column coordinates will be summed. In the following example, (2, 3) corresponds to two values: 1 and 10, which are summed together when converted to an ndarray array, so that the value at coordinates (2, 3) in the final matrix is 11.

Many sparse matrices are saved in files using this format, for example, a CSV file may have three columns like this: "User ID, Product ID, Rating Value". After reading in the data using pandas.read_csv, it can be quickly converted into a sparse matrix using coo_matrix: each row of the matrix corresponds to a user, each column corresponds to an item, and the element values are the user's ratings for the item.

row = [2, 3, 3, 2]
col = [3, 4, 2, 3]
data = [1, 2, 3, 10]
c = sparse.coo_matrix((data, (row, col)), shape=(5, 6))
print , , 
print ()

[3 4 2 3] [2 3 3 2] [ 1 2 3 10]
[[ 0 0 0 0 0 0]
 [ 0 0 0 0 0 0]
 [ 0 0 0 11 0 0]
 [ 0 0 3 0 2 0]
 [ 0 0 0 0 0 0]]

choice in individual operations.coo_matrix Selected because it involves sparse matrix operations, but if you do not use other forms of storage is too high complexity (time and space) 1000 * 1000 matrix about words 2h, but also to kill. I had no choice but to think of the Pajek software in the data input format ternary:

So the idea is to process your data into a similar ternary!

i.e. "matrix matrix" - > "tuple ternary" - > "sparseMatrix2tuple" - >""

Thanks for reading, I hope this helps, and thanks for supporting this site!