Einstein Summation Convention
When working with matrices and vectors, we often encounter differenct kind of multiplications leading to different results. Since it is impossible to invent a special symbol for every kind of multiplication and it would be very consufing anyways, it is advantageous to write them using summation symbols. An ordinary maatrix-matrix product can e.g. be written as $$\sum_{j} A_{ij} B_{jk}$$
With more objects however, this notation becomes tedious very fast. For a matrix element for example, this notation looks like $$\sum_{i}\sum_{j} u_{i} A_{ij} v_{j}$$
Einstein introduced a short-hand notation for this: he just leaves out the summation symbol and write e.g. $$u_{i} A_{ij} v_{j}$$ for a matrix element. We just have to remenber, that summation is implied for all doubly occuring indices. If one index occurs once, it is left unchanged. If one index occurs three times or more, we have made a mistake. Some examples are listed here:
| Product | Notation |
|---|---|
| inner product | \(u_{i} v_{i}\) |
| matrix-vector product | \(A_{ij} v_{j}\) |
| matrix multiplication | \(A_{ij} B_{jk}\) |
| trace | \(A_{ii}\) |
| outer product | \(u_{i} v_{j}\) |
With the help of numpy, we can compute different product between arrays using this convention. A matrix multiplication can be written as
np.einsum('ij,jk', A, B)
The first argument mimics the contraction indices, separated with a comma. The number of index groups must be identical to the number of arrays after this argument. Also, the number of indices in every group must be identical to the dimensionality of the corresponding array.
This notation really shines when dealing with data structures with multiple
indices, generally called tensors. Scalers, vectors and matrices are zeroth,
first and second order tensors, respectively. The total potential energy
operator represented on a space grid with three indices is thus a third order
tensor. Instead of resorting to slow loops or figuring out how to do it with
np.dot, we can have the speed of np.dot and the convenience of looping,
with the added bonus of being very compact.
As an example, we shall calculate a batch matrix multiplication: $$C_{nik} = \sum_{j} A_{nij} B_{jk}$$ with
a = np.array([
[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],
[[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]]
])
b = np.array(
[[ 0, 10, 20],
[ 30, 40, 50],
[ 60, 70, 80],
[ 90, 100, 110],
[120, 130, 140]]
)
The implementation using loops looks like
len_n, len_i, len_j = a.shape
len_k = b.shape[1]
c = np.zeros((len_n, len_i, len_k))
for n in range(0, len_n):
for i in range(0, len_i):
for j in range(0, len_j):
for k in range(0, len_k):
c[n, i, k] += a[n, i, j] * b[j, k]
which is intuitive to write, but really lengthy and inefficient.
Using np.dot, we can implement this product as
len_n, len_i, len_j = a.shape
len_k = b.shape[1]
c = np.dot(a.reshape(len_n * len_i, len_j), b).reshape(len_n, len_i, len_k)
which is short, fast, but really not readable.
With the help of np.einsum, the code becomes even shorter and more readable:
c = np.einsum('nij,jk', a, b).transpose((2, 0, 1))
The transpose swap some axes to list them in the correct order, since
np.einsum reorders the indices alphabetically and thus transposes
a, because n comes after i and j. This can be dealt with by extending this
notation. While nij,jk is called the implicit mode, since the shape of
output is not specified, in explicit mode, we write nij,jk->nik, which
forces the shape of the output array. Using this notation, the above product
can be implemented as
c = np.einsum('nij,jk->nik', a, b)
which is even more readable. Using this extended notation, we also have more
functionalities. When applying \(\exp(\hat{V})\) to \(\psi\), the first
index of \(V_{i,s,t}\) representing the space-dependency of \(v_{st}\)
should be multipled element-wise with the first index of \(\psi_{i,s}\),
which represents the space-dependency of \(\psi_{s}\). A matrix-vector
product should however be applied to
\(\mathbf{\exp{V}_ {i}} \vec{\psi}_ {i}\) at every grid point. In this
case, we can use the explicit mode and write the contraction string as
'ist,it->is', which multiply element-wise along the index i and
performs a matrix-vector product st,t->s along the other axes.
It takes some time to get used to this, but afterwards it becomes really
handy when dealing with high-dimensional arrays. In addition to playing
around with einsum, there is an interactive website
einsum-explainer
which helps you understand these notations.