2.2 Matrix multiplication

We are going to build up the definition of matrix multiplication in several steps. The first is that if \(r= (r_1,\ldots, r_n)\) is a 1✕n row vector and \(c = \begin{pmatrix} c_1 \\ \vdots \\ c_n \end{pmatrix}\) is a n✕1 column vector, we define \[ rc = r_1c_1 + \cdots + r_n c_n. \] This might remind you of the dot product if you have seen that before.

For example, if \(r=(1\; 2)\) and \(c = \begin{pmatrix} 3 \\ 4\end{pmatrix}\) then \[rc = 1\times 3 + 2 \times 4 = 11.\]

Next, suppose A is a m✕n matrix and \(c\) is again a height n column vector. Let \(r_1,\ldots, r_m\) be the rows of A; these are 1✕n row vectors. We then define \(Ac\) to be the m✕1 column vector \[ \begin{pmatrix} r_1 c \\ \vdots \\ r_m c \end{pmatrix}.\] For example, if \(A = \begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}\) and \(c = \begin{pmatrix}1\\0\\-1\end{pmatrix}\) then \[ Ac =\begin{pmatrix} 1\times 1 + 2 \times 0 + 3 \times (-1) \\ 4 \times 1 + 5 \times 0 + 6 \times (-1)\end{pmatrix} \]

Finally, the most general definition of matrix multiplication. Suppose A is m✕n and B is n✕p (so that the length of a row of A is the same as the height of a column of B). Write \(c_j\) for the jth column of B. Then we define AB to be the m✕p matrix whose jth column is \(Ac_j\): \[AB = (Ac_1 \; Ac_2 \; \cdots A c_p) \] This time if \(A = \begin{pmatrix}1&2\\3&4\end{pmatrix}\) and \(B = \begin{pmatrix} 5&6&7 \\ 1&0&-1 \end{pmatrix}\), the columns of B are \(c_1 = \begin{pmatrix}5\\1\end{pmatrix}\), \(c_2 = \begin{pmatrix} 6\\0\end{pmatrix}\), and \(c_3 = \begin{pmatrix} 7\\-1\end{pmatrix}\) and \[\begin{align*}AB &= (Ac_1\; Ac_2 \; Ac_3) \\ &= \begin{pmatrix} 1\times 5 + 2\times 1 & 1\times 6 + 2 \times 0 & 1\times 7 + 2 \times (-1) \\ 3\times 5 + 4 \times 1 &3\times 6 + 4\times 0 & 3 \times 7 + 4 \times (-1) \end{pmatrix}\end{align*}\]

This definition of matrix multiplication is good for hand calculations, and for emphasising that matrix multiplication happens columnwise (that is, A multiplies into B one column of B at a time). Sometimes, especially when proving properties of matrhx multiplication, it is more convenient to have a definition with an explicit formula. The definition we have just seen is equivalent to the following:

Definition 2.5 Let \(A=(a_{ij})\) be m✕n and \(B=(b_{ij})\) be n✕p. The matrix product AB is the m✕p matrix whose i,j entry is \[ \sum_{k=1}^n a_{ik}b_{kj}. \]

Notice that we only define the product AB when the number of columns of A is the same as the number of rows of B.

Here are some important properties of matrix multiplication:

Proposition 2.1 Let A and A’ be m✕n, B and B’ be n ✕ p, and C be p ✕ q. Let \(\lambda\) be a number.

(AB)C=A(BC) (that is, matrix multiplication is associative).
(A+A’)B=AB + A’B, A(B+B’)=AB+AB’ (matrix multiplication distributes over addition).
\((\lambda A) B = \lambda (AB) = A(\lambda B)\).
\(A \mathbf{0}_{n\times p}= \mathbf{0}_{m\times p}\) and \(\mathbf{0}_{p\times m} A = \mathbf{0}_{p\times n}\).
\((AB)^T = B^T A^T\).

Proof. Let \(A=(a_{ij}), A'=(a'_{ij}), B=(b_{ij}), B'=(b'_{ij}), C=(c_{ij})\). During this proof we also write \(X_{ij}\) to mean the i, j entry of a matrix X.

AB has i,j entry \(\sum_{k=1}^n a_{ik}b_{kj}\), so the i, j entry of (AB)C is \[ \sum_{l=1}^p (AB)_{il} c_{lj} = \sum_{l=1}^p \sum_{k=1}^n a_{ik}b_{kl}c_{lj}.\] On the other hand, the i, j entry of BC is \(\sum_{l=1}^pb_{il}c_{lj}\) so the i, j entry of A(BC) is \[\begin{align*}\sum_{k=1}^n a_{ik}(BC)_{kj} &= \sum_{k=1}^n a_{ik} \sum_{l=1}^p b_{kl}c_{lj}\\ &= \sum_{k=1}^n \sum_{l=1}^p a_{ik}b_{kl}c_{lj}. \end{align*}\] These are the same: it doesn’t matter if we do the k or l summation first, since we just get the same terms in a different order.
The i, j entry of \((A+A')B\) is \(\sum_{k=1}^n (a_{ik} + a'_{ik})b_{jk}\) which equals \(\sum_{k=1}^n a_{ik}b_{kj} + \sum_{k=1}^n a'_{ik}b_{kj}\), but this is the sum of the i, j entry of AB and the i, j entry of \(A'B\), proving the first equality. The second is similar.
The i,j entry of \(\lambda A\) is \(\lambda a_{ij}\), so the i, j entry of \((\lambda A)B\) is \[ \sum_{k=1}^n(\lambda a_{ik})b_{kj} = \lambda \sum_{k=1}^n a_{ik}b_{kj}= \lambda (AB)_{ij}\] so \((\lambda A)B\) and \(\lambda (AB)\) have the same i, j entry for any i, j, and are therefore equal. The second equality can be proved similarly.
This is clear from the definition of matrix multiplication.
The i, j entry of AB is \(\sum_{k=1}^n a_{ik}b_{kj}\), so the i, j entry of \((AB)^T\) is \(\sum_{k=1}^n a{jk}b_{ki}\). If we write the i, j entry of \(B^T\) as \(\beta_{ij}\) and the i, j entry of \(A^T\) as \(\alpha_{ij}\) then the i, j entry of \(B^T A^T\) is \(\sum_{i=1}^n \beta_{ik}\alpha_{kj}\), but \(\beta_{ik}=b_{ki}\) and \(\alpha_{kj} = a_{jk}\) so this is \(\sum_{i=1}^n a_{jk}b_{ki}\) which is the same as the i, j entry of \((AB)^T\).

These properties show some of the ways in which matrix multiplication is like ordinary multiplication of numbers. There are two important ways in which it is different: in general, \(AB \neq BA\) \[\begin{align*} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} & = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix} \\ \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} &= \begin{pmatrix} 23 & 34 \\ 31 & 46 \end{pmatrix} \end{align*}\] and unlike for multiplying numbers, we can have \(AB=\mathbf{0}\) even when \(A,B\neq \mathbf{0}\): \[ \begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0&0\\0&0 \end{pmatrix}.\]

There is a matrix which behaves like 1 does under multiplication:

Definition 2.6 The n ✕ n identity matrix \(I_n\) is the matrix whose i, j entry is 1 if i=j and 0 otherwise.

\[ I_2= \begin{pmatrix} 1&0\\0&1\end{pmatrix},\;\; I_3 = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&1 \end{pmatrix} \]

Proposition 2.2 Let A be m ✕ n. Then \(AI_n = A = I_m A\).

Proof. Let \(A=(a_{ij})\), and write \(I_n = (\delta_{ij})\) so that \(\delta_{ij}=0\) if \(i\neq j\) and \(\delta_{ii}=1\) for each i. Using the definition of matrix multiplication, the i, j entry of \(AI_n\) is \[ \sum_{k=1}^n a_{ik}\delta_{kj}. \] Because \(\delta_{ab}\) is zero unless a=b, the only term in this sum which is not zero is when \(k=j\). This term is \(a_{ij}\times 1 = a_{ij}\), so the i, j entry of \(AI_n\) is the same as the i, j entry of A and \(AI_n=A\).

The proof that \(I_m A=A\) is similar.

Definition 2.7 An n✕n matrix A is called invertible if there is an n✕n matrix B such that \(AB=I_n=BA\).

If A is invertible then there is one and only one matrix B such that \(AB=I_n=BA\).

Lemma 2.1 Suppose \(AB = BA = I_n\) and \(AC = CA = I_n\). Then B=C.

Proof. \[\begin{align*} B&= I_nB & \text{as } IX=X \text{ for any} X \\ &= (CA)B \\ &= C(AB) & \text{matrix mult is associative} \\ &=CI_n \\ &= C. \end{align*}\]

When A is invertible we use the notation \(A^{-1}\) for the matrix such that \(AA^{-1}=A^{-1}A=I_n\).

Not every non-zero square matrix is invertible: you can check directly that \(\begin{pmatrix} 0&1\\0&0 \end{pmatrix}\) and \(\begin{pmatrix}1&1\\1&1\end{pmatrix}\) aren’t invertible, for example. More generally, suppose that A and B are any non-zero matrices such that \(AB=\mathbf{0}_{n\times n}\) (we have already seen examples of this). If A were invertible we could multiply this equation on the left by the inverse of A to get \[\begin{align*} A^{-1} AB &= A^{-1} \mathbf{0}_{n\times n} \\ B &= \mathbf{0}_{n\times n} \end{align*}\] which is not the case. It follows that if there is a non-zero matrix B such that \(AB=\mathbf{0}_{n\times n}\) then A isn’t invertible (and a similar argument, multiplying on the right instead of the left, shows neither is B).

2.2.1 Why define matrix multiplication this way?

The last thing in this section on matrix multiplication is a quick comment on where it comes from.
Write \(\mathbb{R}^p\) for the set of all p✕1 column vectors with real numbers as entries. If A is an m✕n matrix (with real number entries, say) then there is a function \[\begin{align*} T_A: & \mathbb{R}^n \to \mathbb{R}^m \\ & T_A(\mathbf{v}) = A\mathbf{v} \end{align*}\]

Now suppose B is n✕p, so that there is a similar map \(T_B : \mathbb{R}^p \to \mathbb{R}^n\). The composition \(T_A \circ T_B\) makes sense as a map \(\mathbb{R}^p \to \mathbb{R}^m\), and it turns out \[ T_A \circ T_B = T_{AB}.\] The connection with composition of maps is why matrix multiplication is defined the way it is.

We use a similar notation for complex column vectors: \(\mathbb{C}^p\) denotes the set of all height p column vectors with complex number entries.