Einstein Summation Convention

When working with matrices and vectors, we often encounter differenct kind of multiplications leading to different results. Since it is impossible to invent a special symbol for every kind of multiplication and it would be very consufing anyways, it is advantageous to write them using summation symbols. An ordinary maatrix-matrix product can e.g. be written as $$\sum_{j} A_{ij} B_{jk}$$

With more objects however, this notation becomes tedious very fast. For a matrix element for example, this notation looks like $$\sum_{i}\sum_{j} u_{i} A_{ij} v_{j}$$

Einstein introduced a short-hand notation for this: he just leaves out the summation symbol and write e.g. $$u_{i} A_{ij} v_{j}$$ for a matrix element. We just have to remenber, that summation is implied for all doubly occuring indices. If one index occurs once, it is left unchanged. If one index occurs three times or more, we have made a mistake. Some examples are listed here:

Product	Notation
inner product	$u_{i} v_{i}$
matrix-vector product	$A_{ij} v_{j}$
matrix multiplication	$A_{ij} B_{jk}$
trace	$A_{ii}$
outer product	$u_{i} v_{j}$

With the help of numpy, we can compute different product between arrays using this convention. A matrix multiplication can be written as

np.einsum('ij,jk', A, B)

The first argument mimics the contraction indices, separated with a comma. The number of index groups must be identical to the number of arrays after this argument. Also, the number of indices in every group must be identical to the dimensionality of the corresponding array.

This notation really shines when dealing with data structures with multiple indices, generally called tensors. Scalers, vectors and matrices are zeroth, first and second order tensors, respectively. The total potential energy operator represented on a space grid with three indices is thus a third order tensor. Instead of resorting to slow loops or figuring out how to do it with np.dot, we can have the speed of np.dot and the convenience of looping, with the added bonus of being very compact.

As an example, we shall calculate a batch matrix multiplication: $$C_{nik} = \sum_{j} A_{nij} B_{jk}$$ with

a = np.array([
    [[ 0,  1,  2,  3,  4], 
     [ 5,  6,  7,  8,  9]], 
    [[10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19]],
    [[20, 21, 22, 23, 24], 
     [25, 26, 27, 28, 29]],
    [[30, 31, 32, 33, 34], 
     [35, 36, 37, 38, 39]]
])

b = np.array(
    [[  0,  10,  20], 
     [ 30,  40,  50], 
     [ 60,  70,  80], 
     [ 90, 100, 110], 
     [120, 130, 140]]
)

The implementation using loops looks like

len_n, len_i, len_j = a.shape
len_k = b.shape[1]

c = np.zeros((len_n, len_i, len_k))
for n in range(0, len_n):
    for i in range(0, len_i):
        for j in range(0, len_j):
            for k in range(0, len_k):
                c[n, i, k] += a[n, i, j] * b[j, k]

which is intuitive to write, but really lengthy and inefficient.

Using np.dot, we can implement this product as

len_n, len_i, len_j = a.shape
len_k = b.shape[1]

c = np.dot(a.reshape(len_n * len_i, len_j), b).reshape(len_n, len_i, len_k)

which is short, fast, but really not readable.

With the help of np.einsum, the code becomes even shorter and more readable:

c = np.einsum('nij,jk', a, b).transpose((2, 0, 1))

The transpose swap some axes to list them in the correct order, since np.einsum reorders the indices alphabetically and thus transposes a, because n comes after i and j. This can be dealt with by extending this notation. While nij,jk is called the implicit mode, since the shape of output is not specified, in explicit mode, we write nij,jk->nik, which forces the shape of the output array. Using this notation, the above product can be implemented as

c = np.einsum('nij,jk->nik', a, b)

which is even more readable. Using this extended notation, we also have more functionalities. When applying $\exp(\hat{V})$ to $\psi$, the first index of $V_{i,s,t}$ representing the space-dependency of $v_{st}$ should be multipled element-wise with the first index of $\psi_{i,s}$, which represents the space-dependency of $\psi_{s}$. A matrix-vector product should however be applied to $\mathbf{\exp{V}_ {i}} \vec{\psi}_ {i}$ at every grid point. In this case, we can use the explicit mode and write the contraction string as 'ist,it->is', which multiply element-wise along the index i and performs a matrix-vector product st,t->s along the other axes.

It takes some time to get used to this, but afterwards it becomes really handy when dealing with high-dimensional arrays. In addition to playing around with einsum, there is an interactive website einsum-explainer which helps you understand these notations.

Programming in Theoretical Chemistry

Einstein Summation Convention