prologue or interlude in Yuan dynasty drama
There is a tensordot method in numpy that will be useful especially when doing machine learning. I guess when someone sees this name, they will think of tensorflow, yes tensorflow also has tensordot function inside. What this function does is that it allows two arrays of different dimensions to be multiplied. Let's take an example:
import numpy as np a = (0, 9, (3, 4)) b = (0, 9, (4, 5)) try: print(a * b) except Exception as e: print(e) # operands could not be broadcast together with shapes (3,4) (4,5) # It's clear that the two arrays a and b don't have the same dimensions, there's no way to multiply them # # But # print((a, b, 1)) """ [[32 32 28 28 52] [10 25 40 38 78] [56 7 28 0 42]] """ # We see it's okay to use tensordot #
Let's see how this function is used
function prototype
@array_function_dispatch(_tensordot_dispatcher) def tensordot(a, b, axes=2):
We see that this function receives three parameters, the first two are numpy in the array, the last parameter is used to specify the axis of contraction. It can receive an integer, a list, a list of nested lists, the specific meaning of what we represent the following examples.
Understanding axes
axes is an integer
If axes receives an integer: m, then it means that the last n axes of the specified array a and the first n axes of the array b are each subjected to an inner product, which is the multiplication of the corresponding positional elements, and then the overall sum.
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) # Obviously these two arrays cannot be multiplied directly, but a and the last two axes and the first two axes of b can be multiplied directly # Since they're both (4, 5), the resulting shape will be (3, 8) # print((a, b, 2).shape) # (3, 8)
And this axes defaults to 2, so it's generally for arrays of three or more dimensions
But in order to understand specifically, later we will use one-dimensional, two-dimensional data specific examples. Now first look at axes take different values, what results will be obtained, first understand the meaning of axes.
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) try: print((a, b, 1).shape) except Exception as e: print(e) # shape-mismatch for sum # The result is an error, it's easy to understand, it's just a shape mismatch # # axes is specified as 1, which means that the latter axis of a is inner-producted with the former axis of b # But one is 5 and one is 4, the elements don't match, so I get an error about shape-mismatch. # Here we change the shape of array b so that the last axis of a and the first axis of b match, both being 5 # a = (60).reshape((3, 4, 5)) b = (160).reshape((5, 4, 8)) print((a, b, 1).shape) # (3, 4, 4, 8) """ That'll do the math.,We say specify the axis of contraction,Performing the inner product operation yields a value of So here's(3, 4, 5)cap (a poem)(5, 4, 8)It's turned into(3, 4, 4, 8) And the previous example was(3, 4, 5)cap (a poem)(4, 5, 8),after thataxes=2 on account ofa的后两个轴cap (a poem)b的前两个轴进行内积It's turned into一个具体的值,So the final dimension is(3, 8) """
What happens if axes is 0
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) print((a, b, 0).shape) # (3, 4, 5, 4, 5, 8) print((b, a, 0).shape) # (4, 5, 8, 3, 4, 5) """ (a, b, 0)equivalence (math.)aEach element in thebmultiply (math.) Then the originalaby replacing the corresponding element in the """
The above operation can also be realized using the Einstein summation
axes=0
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) c1 = (a, b, 0) c2 = ("ijk,xyz->ijkxyz", a, b) print(, ) # (3, 4, 5, 4, 5, 8) (3, 4, 5, 4, 5, 8) print((c1 == c2)) # True """ The generated c1 and c2 are the same """ c3 = (b, a, 0) c4 = ("ijk,xyz->xyzijk", a, b) print(, ) # (4, 5, 8, 3, 4, 5) (4, 5, 8, 3, 4, 5) print((c3 == c4)) # True """ The generated c3 and c4 are the same """
So which is better or worse between them in terms of efficiency? Let's test it on jupyter
>>> %timeit c1 = (a, b, 0) 50.5 µs ± 206 ns per loop >>> %timeit c2 = ("ijk,xyz->ijkxyz", a, b) 7.29 µs ± 242 ns per loop
You can see that Einstein sums up a lot faster.
axes=1
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((5, 4, 8)) c1 = (a, b, 1) c2 = ("ijk,kyz->ijyz", a, b) print(, ) # (3, 4, 4, 8) (3, 4, 4, 8) print((c1 == c2)) # True
axes=2
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) c1 = (a, b, 2) c2 = ("ijk,jkz->iz", a, b) print(, ) # (3, 8) (3, 8) print((c1 == c2)) # True
axes is a list
If axes receives a list:[m, n], then it means that let the m+1st of a(indexed by m)
axis and the n+1st of b(indexed by n)
The inner product is performed for the individual axes. The biggest advantage of using the list approach is that you can specify the axes at any position.
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) # We see that the second dimension (or axis) of a and the first dimension of b are both 4, so they are inner-productable c1 = (a, b, [1, 0]) # Since the result of the inner product is a scalar, the shape of (3, 4, 5) and (4, 5, 8) after tensordot is (3, 5, 5, 8) # The equivalent of throwing away their respective 4's (because they become scalars) and combining them together print() # (3, 5, 5, 8) # Similarly the last dimension of a and the second dimension of b are inner productable # # The last dimension can also use -1, which is the same as taking the corresponding dimension at the index of the list c2 = (a, b, [-1, 1]) print() # (3, 4, 4, 8)
The above operation can also be realized using the Einstein summation
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) c1 = (a, b, [1, 0]) c2 = ("ijk,jyz->ikyz", a, b) print(, ) # (3, 5, 5, 8) (3, 5, 5, 8) print((c1 == c2)) # True c3 = (a, b, [-1, 1]) c4 = ("ijk,akz->ijaz", a, b) print(, ) # (3, 4, 4, 8) (3, 4, 4, 8) print((c3 == c4)) # True
axes is a list of nested lists
If axes receives a list of nested lists: [[m], [n]], it is the same as saying that more than one axis can be selected
import numpy as np a = (60).reshape((3, 4, 5)) b = (160).reshape((4, 5, 8)) # We want the inner product of the last two axes of a and the first two axes of b # c1 = (a, b, axes=2) c2 = (a, b, [[1,2], [0,1]]) print(, ) # (3, 8) (3, 8) print((c1 == c2)) # True
But using lists for filtering has the added benefit of ignoring order
import numpy as np a = (60).reshape((4, 3, 5)) b = (160).reshape((4, 5, 8)) # It's not possible to pass integers to axes at this point # c3 = (a, b, [[0, 2], [0, 1]]) print() # (3, 8)
In addition, another powerful feature of using list filtering is the ability to take values backwards
import numpy as np a = (60).reshape((4, 5, 3)) b = (160).reshape((5, 4, 8)) # This time we choose the first two axes, but one is (4, 5) and one is (5, 4), so we can't multiply them # So when selecting you need to filter backwards: # # [[0, 1], [1, 0]] -> (4, 5) and (4, 5) or [[1, 0], [0, 1]] -> (5, 4) and (5, 4) c3 = (a, b, [[0, 1], [1, 0]]) print() # (3, 8)
Finally the same look at how Einstein summation can be implemented to
import numpy as np a = (60).reshape((4, 5, 3)) b = (160).reshape((4, 5, 8)) c1 = (a, b, [[0, 1], [0, 1]]) c2 = ("ijk,ijz->kz", a, b) print(, ) # (3, 8) (3, 8) print((c1 == c2)) # True a = (60).reshape((4, 5, 3)) b = (160).reshape((5, 4, 8)) c1 = (a, b, [[0, 1], [1, 0]]) c2 = ("ijk,jiz->kz", a, b) print(, ) # (3, 8) (3, 8) print((c1 == c2)) # True a = (60).reshape((4, 3, 5)) b = (160).reshape((5, 4, 8)) c1 = (a, b, [[0, 2], [1, 0]]) c2 = ("ijk,kiz->jz", a, b) print(, ) # (3, 8) (3, 8) print((c1 == c2)) # True
Take two one-dimensional arrays as an example
Let's take a look at tensordot by printing the concrete array
import numpy as np a = ([1, 2, 3]) b = ([2, 3, 4]) print((a, b, axes=0)) """ [[ 2 3 4] [ 4 6 8] [ 6 9 12]] """ print(("i,j->ij", a, b)) """ [[ 2 3 4] [ 4 6 8] [ 6 9 12]] """ # We axes=0, which is equivalent to multiplying each element of a by the sum and then replacing the original a counterpart # So it's 1 2 3 in a multiplied by b to get [2 3 4] [4 6 8] [6 9 12], and then 1 2 3 is replaced. # So the result is [[2 3 4] [4 6 8] [6 9 12]]
What if axes=1?
import numpy as np a = ([1, 2, 3]) b = ([2, 3, 4]) print((a, b, axes=1)) # 20 """ Pick the first axis of a and the second axis of b for the inner product And a and b have only one axis, so the result is a scalar """ print(("i,i->", a, b)) # 20
What if axes = 2? First we say that axes is equal to an integer, indicating that the last n axes of a and the first n axes of b are selected, whereas for one-dimensional arrays they have only one axis
import numpy as np a = ([1, 2, 3]) b = ([2, 3, 4]) try: print((a, b, axes=2)) # 20 except Exception as e: print(e) # tuple index out of range
Obviously indexing crosses the line.
Take a one-dimensional array and a two-dimensional array for example
Let's tensordot through a one-dimensional array and a two-dimensional array to get a feel for it!
axes=0
import numpy as np a = ([1, 2, 3]) b = ([[2, 3, 4]]) print((a, b, 0)) """ [[[ 2 3 4]] [[ 4 6 8]] [[ 6 9 12]]] """ print(("i,jk->ijk", a, b)) """ [[[ 2 3 4]] [[ 4 6 8]] [[ 6 9 12]]] """ # It's easy to understand, it's 1 2 3 multiplied by [[2, 3, 4]] and replaced by 1 2 3 respectively. print((a, b, 0).shape) # (3, 1, 3) ########################## print((b, a, 0)) """ [[[ 2 4 6] [ 3 6 9] [ 4 8 12]]] """ print(("i,jk->jki", a, b)) """ [[[ 2 4 6] [ 3 6 9] [ 4 8 12]]] """ # It's easy to understand, it's just 2 3 4 multiplied by [1 2 3] and replacing 2 3 4 respectively print((b, a, 0).shape) # (1, 3, 3)
What if axes=1?
import numpy as np a = ([1, 2, 3]) b = ([[2, 3, 4], [4, 5, 6]]) try: print((a, b, 1)) except Exception as e: print(e) # shape-mismatch for sum # We notice that an error was reported because axes=1, which means take the last axis of a and the first 1 axis of b # a's shape is (3, 0), so the lengths of the arrays corresponding to its latter and former axes are both 3 # But the previous axis of b corresponds to an array length of 2, which doesn't match, so it's an error. print((b, a, 1)) # [20 32] # We see that this one is OK, because it means that the latter axis of b, with an array length of 3, is a match # # Let [2 3 4] and [4 5 6] of the latter axis be inner products with [1 2 3], respectively, to end up with two scalars try: print(("i,ij->ij", a, b)) except Exception as e: print(e) # operands could not be broadcast together with remapped shapes [original->remapped]: (3,)->(3,newaxis) (2,3)->(2,3) # The same can't be said for Einstein sums, we need to change the order # print(("i,ji->j", a, b)) # [20 32] # Or print(("j,ij->i", a, b)) # [20 32]
What if axes=2?
import numpy as np a = ([1, 2, 3]) b = ([[2, 3, 4], [4, 5, 6]]) try: print((a, b, 2)) except Exception as e: print(e) # tuple index out of range # We notice that the error is reported because axes=2, which means take the last two axes of a and the first two axes of b # And a has only 1 axis in total, so it's an error # try: print((b, a, 2)) except Exception as e: print(e) # shape-mismatch for sum # We see that although an error is also reported, it's not an index out of bounds. # Because the above indicates that the first two axes of a are taken, even though there is only one a, it will not index out of bounds at this point, it will just take one. If it were to take the last two it would be out of bounds # At this point b is (2, 3) and a is (3,) Mismatch, one might think a broadcast would happen, but not here
Take two two-dimensional arrays as an example
Let's tensordot through two more two-dimensional arrays to get a feel for it
axes=0
import numpy as np a = ([[1, 2, 3]]) b = ([[2, 3, 4], [4, 5, 6]]) # a_shape: (1, 3) b_shape(3, 3) print((a, b, 0)) """ [[[[ 2 3 4] [ 4 5 6]] [[ 4 6 8] [ 8 10 12]] [[ 6 9 12] [12 15 18]]]] """ print(("ij,xy->ijxy", a, b)) """ [[[[ 2 3 4] [ 4 5 6]] [[ 4 6 8] [ 8 10 12]] [[ 6 9 12] [12 15 18]]]] """ print((a, b, 0).shape) # (1, 3, 2, 3) ############# print((b, a, 0)) """ [[[[ 2 4 6]] [[ 3 6 9]] [[ 4 8 12]]] [[[ 4 8 12]] [[ 5 10 15]] [[ 6 12 18]]]] """ print(("ij,xy->xyij", a, b)) """ [[[[ 2 4 6]] [[ 3 6 9]] [[ 4 8 12]]] [[[ 4 8 12]] [[ 5 10 15]] [[ 6 12 18]]]] """ print((b, a, 0).shape) # (2, 3, 1, 3)
axes=1
import numpy as np a = ([[1, 2], [3, 4]]) b = ([[2, 3, 4], [4, 5, 6]]) # a_shape: (2, 2) b_shape(2, 3) print((a, b, 1)) """ [[10 13 16] [22 29 36]] """ print(("ij,jk->ik", a, b)) """ [[10 13 16] [22 29 36]] """ # You must have noticed, if you're careful, that this is equivalent to the dot product of the matrix # print(a @ b) """ [[10 13 16] [22 29 36]] """
axes=2
import numpy as np a = ([[1, 2], [3, 4]]) b = ([[2, 3, 4], [4, 5, 6]]) # a_shape: (2, 2) b_shape(2, 3) # Taking the last two axes obviously doesn't work because (2, 2) and (2, 3) don't match try: print((a, b, 2)) except Exception as e: print(e) # shape-mismatch for sum a = ([[1, 2, 3], [2, 2, 2]]) b = ([[2, 3, 4], [4, 5, 6]]) print((a, b, 2)) # 50 print(("ij,ij->", a, b)) # 50
Finally, look at the example of Einstein's summation and feel the difference between it and the protagonist tensordot, but of course you don't have to look at it if you're not familiar with Einstein's summation.
import numpy as np a = (1, 9, (5, 3, 2, 3)) b = (1, 9, (3, 3, 2)) c1 = a @ b # Multi-dimensional arrays, by default dot-multiplying the last two digits c2 = ("ijkm,jmn->ijkn", a, b) print((c1 == c2)) # True print() # (5, 3, 2, 2) print(("...km,...mn->...kn", a, b).shape) # (5, 3, 2, 2) # But if it's c3 = ("ijkm,amn->ijkn", a, b) print() # (5, 3, 2, 2) # The two arrays are not the same even though the shapes are the same because the symbols are not the same print((c3 == c1)) # False a = (1, 9, (5, 3, 3, 2)) b = (1, 9, (1, 3, 2)) print(("ijmk,jmn->ijkn", a, b).shape) # (5, 3, 2, 2) print(("ijkm,jnm->ijkn", a, b).shape) # (5, 3, 3, 3)
to this article on the use of numpy tensordot article is introduced to this, more related numpy tensordot content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!