## 2.2 Matrix multiplication

We are going to build up the definition of matrix multiplication in several steps. The first is that if $$\mathbf{r}= (r_1,\ldots, r_n)$$ is a 1✕n row vector and $$\mathbf{c} = \begin{pmatrix} c_1 \\ \vdots \\ c_n \end{pmatrix}$$ is a n✕1 column vector, we define $\mathbf{r}\mathbf{c} = r_1c_1 + \cdots + r_n c_n.$ This might remind you of the dot product, if you have seen that before.

For example, if $$\mathbf{r}=(1\; 2)$$ and $$\mathbf{c} = \begin{pmatrix} 3 \\ 4\end{pmatrix}$$ then $\mathbf{rc} = 1\times 3 + 2 \times 4 = 11.$

Next, suppose A is a mn matrix and $$\mathbf{c}$$ is again a height n column vector. Let $$\mathbf{r}_1,\ldots, \mathbf{r}_m$$ be the rows of A; these are 1✕n row vectors. We then define $$A\mathbf{c}$$ to be the m✕1 column vector $\begin{pmatrix} \mathbf{r}_1 \mathbf{c} \\ \vdots \\ \mathbf{r}_m \mathbf{c} \end{pmatrix}.$ For example, if $$A = \begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}$$ and $$\mathbf{c} = \begin{pmatrix}1\\0\\-1\end{pmatrix}$$ then $A\mathbf{c} =\begin{pmatrix} 1\times 1 + 2 \times 0 + 3 \times (-1) \\ 4 \times 1 + 5 \times 0 + 6 \times (-1)\end{pmatrix}$

Finally, the most general definition of matrix multiplication. Suppose A is mn and B is np (so that the length of a row of A is the same as the height of a column of B). Write $$\mathbf{c}_j$$ for the jth column of B. Then we define AB to be the mp matrix whose jth column is $$A\mathbf{c}_j$$: $AB = (A\mathbf{c}_1 \; A\mathbf{c}_2 \; \cdots A \mathbf{c}_p)$ This time if $$A = \begin{pmatrix}1&2\\3&4\end{pmatrix}$$ and $$B = \begin{pmatrix} 5&6&7 \\ 1&0&-1 \end{pmatrix}$$, the columns of B are $$c_1 = \begin{pmatrix}5\\1\end{pmatrix}$$, $$c_2 = \begin{pmatrix} 6\\0\end{pmatrix}$$, and $$c_3 = \begin{pmatrix} 7\\-1\end{pmatrix}$$ and \begin{align*}AB &= (A\mathbf{c}_1\; A\mathbf{c}_2 \; A\mathbf{c}_3) \\ &= \begin{pmatrix} 1\times 5 + 2\times 1 & 1\times 6 + 2 \times 0 & 1\times 7 + 2 \times (-1) \\ 3\times 5 + 4 \times 1 &3\times 6 + 4\times 0 & 3 \times 7 + 4 \times (-1) \end{pmatrix}\end{align*}

This definition of matrix multiplication is good for hand calculations, and for emphasising that matrix multiplication happens columnwise (that is, A multiplies into B one column of B at a time). Sometimes, especially when proving properties of matrhx multiplication, it is more convenient to have a definition with an explicit formula. The definition we have just seen is equivalent to the following:

Definition 2.5 Let $$A=(a_{ij})$$ be mn and $$B=(b_{ij})$$ be np. The matrix product AB is the mp matrix whose i, j entry is $\sum_{k=1}^n a_{ik}b_{kj}.$

Notice that we only define the product AB when the number of columns of A is the same as the number of rows of B.

Here are some important properties of matrix multiplication:

Proposition 2.1 Let A and A’ be mn, B and B’ be np, and C be pq. Let l be a number.

• (AB)C=A(BC) (that is, matrix multiplication is associative).
• (A+A’)B=AB + AB, A(B+B’)=AB+AB’ (matrix multiplication distributes over addition).
• $$(lA) B = l (AB) = A(l B)$$.
• $$A \mathbf{0}_{n\times p}= \mathbf{0}_{m\times p}$$ and $$\mathbf{0}_{p\times m} A = \mathbf{0}_{p\times n}$$.
• $$(AB)^T = B^T A^T$$.

Proof. Let $$A=(a_{ij}), A'=(a'_{ij}), B=(b_{ij}), B'=(b'_{ij}), C=(c_{ij})$$. During this proof we also write $$X_{ij}$$ to mean the i, j entry of a matrix X.

• AB has i, j entry $$\sum_{k=1}^n a_{ik}b_{kj}$$, so the i, j entry of (AB)C is $\sum_{l=1}^p (AB)_{il} c_{lj} = \sum_{l=1}^p \sum_{k=1}^n a_{ik}b_{kl}c_{lj}.$ On the other hand, the i, j entry of BC is $$\sum_{l=1}^pb_{il}c_{lj}$$ so the i, j entry of A(BC) is \begin{align*}\sum_{k=1}^n a_{ik}(BC)_{kj} &= \sum_{k=1}^n a_{ik} \sum_{l=1}^p b_{kl}c_{lj}\\ &= \sum_{k=1}^n \sum_{l=1}^p a_{ik}b_{kl}c_{lj}. \end{align*} These are the same: it doesn’t matter if we do the k or l summation first, since we just get the same terms in a different order.
• The i, j entry of (A+A’)B is $$\sum_{k=1}^n (a_{ik} + a'_{ik})b_{jk}$$ which equals $$\sum_{k=1}^n a_{ik}b_{kj} + \sum_{k=1}^n a'_{ik}b_{kj}$$, but this is the sum of the i, j entry of AB and the i, j entry of A’B, proving the first equality. The second is similar.
• The i, j entry of $$l A$$ is $$\lambda a_{ij}$$, so the i, j entry of (l A)B is $\sum_{k=1}^n(l a_{ik})b_{kj} = l \sum_{k=1}^n a_{ik}b_{kj}= l (AB)_{ij}$ so (l A)B\$ and l (AB) have the same i, j entry for any i, j, and are therefore equal. The second equality can be proved similarly.
• This is clear from the definition of matrix multiplication.
• The i, j entry of AB is $$\sum_{k=1}^n a_{ik}b_{kj}$$, so the i, j entry of $$(AB)^T$$ is $$\sum_{k=1}^n a_{jk}b_{ki}$$. If we write the i, j entry of $$B^T$$ as $$\beta_{ij}$$ and the i, j entry of $$A^T$$ as $$\alpha_{ij}$$ then the i, j entry of $$B^T A^T$$ is $$\sum_{i=1}^n \beta_{ik}\alpha_{kj}$$, but $$\beta_{ik}=b_{ki}$$ and $$\alpha_{kj} = a_{jk}$$ so this is $$\sum_{i=1}^n a_{jk}b_{ki}$$ which is the same as the i, j entry of $$(AB)^T$$.

These properties show some of the ways in which matrix multiplication is like ordinary multiplication of numbers. There are two important ways in which it is different: in general, $$AB \neq BA$$ \begin{align*} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} & = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix} \\ \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} &= \begin{pmatrix} 23 & 34 \\ 31 & 46 \end{pmatrix} \end{align*} and unlike for multiplying numbers, we can have $$AB=\mathbf{0}$$ even when $$A,B\neq \mathbf{0}$$: $\begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0&0\\0&0 \end{pmatrix}.$

There is a matrix which behaves like 1 does under multiplication:

Definition 2.6 The nn identity matrix $$I_n$$ is the matrix whose i, j entry is 1 if i=j and 0 otherwise.

a For example, $I_2= \begin{pmatrix} 1&0\\0&1\end{pmatrix},\;\; I_3 = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&1 \end{pmatrix}.$

Proposition 2.2 Let A be mn. Then $$AI_n = A = I_m A$$.

Proof. Let $$A=(a_{ij})$$, and write $$I_n = (\delta_{ij})$$ so that $$\delta_{ij}=0$$ if $$i\neq j$$ and $$\delta_{ii}=1$$ for each i. Using the definition of matrix multiplication, the i, j entry of $$AI_n$$ is $\sum_{k=1}^n a_{ik}\delta_{kj}.$ Because $$\delta_{kj}$$ is zero unless k=j, the only term in this sum which might not be zero is the one when k=j. This term is $$a_{ij}\times 1 = a_{ij}$$, so the i, j entry of $$AI_n$$ is the same as the i, j entry of A and therefore $$AI_n=A$$.

The proof that $$I_m A=A$$ is similar.

When we work in $$\rr^n$$ or $$\mathbb{C}^n$$, we use the notation $$\mathbf{e}_i$$ to refer to the height n column vector with a 1 in position i and zeroes elsewhere. For example, if n was 3 then

$\mathbf{e}_1 = \begin{pmatrix} 1\\0\\0\end{pmatrix}, \mathbf{e}_2 = \begin{pmatrix} 0\\1\\0\end{pmatrix}, \mathbf{e}_3 = \begin{pmatrix} 0\\0\\1\end{pmatrix}.$

The vector $$\mathbf{e}_i$$ is called the $$i$$th standard basis vector - you will learn what a basis is in the next section. These vectors have an important property with respect to matrix multiplication:

Lemma 2.1 Let A be an mn matrix. Then for any $$1\leq x \leq n$$ the product $$A \mathbf{e}_x$$ is equal to the xth column of A.
Proof. Let $$A = (a_{ij})$$, so for any $$1\leq i \leq m$$ the ith entry of the height m column vector $$A \mathbf{e}_x$$ is $$\sum_{k=1}^n a_{ik}e_k$$ where $$e_k$$ is the kth entry of $$\mathbf{e}_x$$. Since $$e_k$$ is zero unless k=x, in which case it is 1, the sum equals $$a_{ix}$$. It follows $A\mathbf{e}_x = \begin{pmatrix} a_{1x} \\ \vdots \\ a_{mx}\end{pmatrix}$ which is the xth column of A.

### 2.2.1 Invertibility

Definition 2.7 An nn matrix A is called invertible if there is an nn matrix B such that $$AB=I_n=BA$$.

The next lemma shows that if A is invertible then there is one and only one matrix B such that $$AB=I_n=BA$$.

Lemma 2.2 Suppose $$AB = BA = I_n$$ and $$AC = CA = I_n$$. Then B=C.
Proof. \begin{align*} B&= I_nB & \text{as } IX=X \text{ for any} X \\ &= (CA)B \\ &= C(AB) & \text{associativity} \\ &=CI_n \\ &= C. \end{align*}

When A is invertible we use the notation $$A^{-1}$$ for the one and only matrix such that $$AA^{-1}=A^{-1}A=I_n$$.

Not every non-zero square matrix is invertible: you can check directly that $$\begin{pmatrix} 0&1\\0&0 \end{pmatrix}$$ and $$\begin{pmatrix}1&1\\1&1\end{pmatrix}$$ aren’t invertible, for example. More generally, suppose that A and B are any non-zero matrices such that $$AB=\mathbf{0}_{n\times n}$$ (we have already seen examples of this). If A were invertible we could multiply this equation on the left by the inverse of A to get \begin{align*} A^{-1} AB &= A^{-1} \mathbf{0}_{n\times n} \\ B &= \mathbf{0}_{n\times n} \end{align*} which is not the case. It follows that if there is a non-zero matrix B such that $$AB=\mathbf{0}_{n\times n}$$ then A isn’t invertible (and a similar argument, multiplying on the right instead of the left, shows neither is B).

Here is another useful result that can tell you a matrix is not invertible:

Lemma 2.3 Let A be a matrix with a row of zeroes or a column of zeroes. Then A is not invertible.

Proof. If A has a column of zeroes then, because matrix multiplication works columnwise, so does BA for any $$B$$. So BA never equals the identity matrix, and thus A is not invertible.

The proof for the case when A has a row of zeroes is similar.

### 2.2.2 Why define matrix multiplication this way?

The last thing in this section on matrix multiplication is a quick comment on where it comes from.
Write $$\mathbb{R}^p$$ for the set of all p✕1 column vectors with real numbers as entries. If A is an mn matrix (with real number entries, say) then there is a function \begin{align*} T_A: & \mathbb{R}^n \to \mathbb{R}^m \\ & T_A(\mathbf{v}) = A\mathbf{v} \end{align*}

Now suppose B is np, so that there is a similar map $$T_B : \mathbb{R}^p \to \mathbb{R}^n$$. The composition $$T_A \circ T_B$$ makes sense as a map $$\mathbb{R}^p \to \mathbb{R}^m$$, and it turns out $T_A \circ T_B = T_{AB}.$ The connection with composition of maps is why matrix multiplication is defined the way it is.

We use a similar notation for complex column vectors: $$\mathbb{C}^p$$ denotes the set of all height p column vectors with complex number entries.