## 2.2 Matrix multiplication

We are going to build up the definition of matrix multiplication in several steps. The first is that if \(\mathbf{r}= (r_1,\ldots, r_n)\) is a 1✕*n* row vector and \(\mathbf{c} = \begin{pmatrix} c_1 \\ \vdots \\ c_n \end{pmatrix}\) is a *n*✕1 column vector, we define
\[ \mathbf{r}\mathbf{c} = r_1c_1 + \cdots + r_n c_n. \]
This might remind you of the dot product, if you have seen that before.

For example, if \(\mathbf{r}=(1\; 2)\) and \(\mathbf{c} = \begin{pmatrix} 3 \\ 4\end{pmatrix}\) then \[\mathbf{rc} = 1\times 3 + 2 \times 4 = 11.\]

Next, suppose *A* is a *m*✕*n* matrix and \(\mathbf{c}\) is again a height n column vector. Let \(\mathbf{r}_1,\ldots, \mathbf{r}_m\) be the rows of *A*; these are 1✕*n* row vectors. We then define \(A\mathbf{c}\) to be the *m*✕1 column vector
\[ \begin{pmatrix} \mathbf{r}_1 \mathbf{c} \\
\vdots \\
\mathbf{r}_m \mathbf{c} \end{pmatrix}.\]
For example, if \(A = \begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}\) and \(\mathbf{c} = \begin{pmatrix}1\\0\\-1\end{pmatrix}\) then
\[ A\mathbf{c} =\begin{pmatrix} 1\times 1 + 2 \times 0 + 3 \times (-1) \\
4 \times 1 + 5 \times 0 + 6 \times (-1)\end{pmatrix} \]

Finally, the most general definition of matrix multiplication. Suppose *A* is *m*✕*n* and *B* is *n*✕*p* (so that the length of a row of *A* is the same as the height of a column of *B*). Write \(\mathbf{c}_j\) for the *j*th column of *B*. Then we define *AB* to be the *m*✕*p* matrix whose *j*th column is \(A\mathbf{c}_j\):
\[AB = (A\mathbf{c}_1 \; A\mathbf{c}_2 \; \cdots A \mathbf{c}_p)\]
This time if \(A = \begin{pmatrix}1&2\\3&4\end{pmatrix}\) and \(B = \begin{pmatrix} 5&6&7 \\ 1&0&-1 \end{pmatrix}\), the columns of *B* are \(c_1 = \begin{pmatrix}5\\1\end{pmatrix}\), \(c_2 = \begin{pmatrix} 6\\0\end{pmatrix}\), and \(c_3 = \begin{pmatrix} 7\\-1\end{pmatrix}\) and
\[\begin{align*}AB &= (A\mathbf{c}_1\; A\mathbf{c}_2 \; A\mathbf{c}_3) \\
&= \begin{pmatrix}
1\times 5 + 2\times 1 & 1\times 6 + 2 \times 0 & 1\times 7 + 2 \times (-1) \\
3\times 5 + 4 \times 1 &3\times 6 + 4\times 0 & 3 \times 7 + 4 \times (-1)
\end{pmatrix}\end{align*}\]

This definition of matrix multiplication is good for hand calculations, and for emphasising that matrix multiplication happens *columnwise* (that is, *A* multiplies into *B* one column of *B* at a time). Sometimes, especially when proving properties of matrhx multiplication, it is more convenient to have a definition with an explicit formula. The definition we have just seen is equivalent to the following:

**Definition 2.5 **Let \(A=(a_{ij})\) be *m*✕*n* and \(B=(b_{ij})\) be *n*✕*p*. The
matrix product *AB* is the *m*✕*p* matrix whose *i*, *j* entry is
\[\sum_{k=1}^n a_{ik}b_{kj}.\]

Notice that we *only* define the product *AB* when the number of columns
of *A* is the same as the number of rows of *B*.

Here are some important properties of matrix multiplication:

**Proposition 2.1 ** Let *A* and *A*’ be *m*✕*n*, *B* and *B*’ be *n*✕*p*, and *C* be *p*✕*q*. Let *l* be a number.

- (
*AB*)*C*=*A*(*BC*) (that is, matrix multiplication is**associative**). - (
*A*+*A*’)*B*=*AB*+*A*’*B*,*A*(*B*+*B*’)=*AB*+*AB*’ (matrix multiplication**distributes over addition**). - \((lA) B = l (AB) = A(l B)\).
- \(A \mathbf{0}_{n\times p}= \mathbf{0}_{m\times p}\) and \(\mathbf{0}_{p\times m} A = \mathbf{0}_{p\times n}\).
- \((AB)^T = B^T A^T\).

*Proof. * Let \(A=(a_{ij}), A'=(a'_{ij}), B=(b_{ij}), B'=(b'_{ij}), C=(c_{ij})\). During this proof we also write \(X_{ij}\) to mean the *i*, *j* entry
of a matrix *X*.

*AB*has*i*,*j*entry \(\sum_{k=1}^n a_{ik}b_{kj}\), so the*i*, j entry of (*AB*)*C*is \[ \sum_{l=1}^p (AB)_{il} c_{lj} = \sum_{l=1}^p \sum_{k=1}^n a_{ik}b_{kl}c_{lj}.\] On the other hand, the*i*,*j*entry of BC is \(\sum_{l=1}^pb_{il}c_{lj}\) so the i, j entry of*A*(*BC*) is \[\begin{align*}\sum_{k=1}^n a_{ik}(BC)_{kj} &= \sum_{k=1}^n a_{ik} \sum_{l=1}^p b_{kl}c_{lj}\\ &= \sum_{k=1}^n \sum_{l=1}^p a_{ik}b_{kl}c_{lj}. \end{align*}\] These are the same: it doesn’t matter if we do the*k*or*l*summation first, since we just get the same terms in a different order.- The
*i*,*j*entry of (*A*+*A*’)*B*is \(\sum_{k=1}^n (a_{ik} + a'_{ik})b_{jk}\) which equals \(\sum_{k=1}^n a_{ik}b_{kj} + \sum_{k=1}^n a'_{ik}b_{kj}\), but this is the sum of the*i*,*j*entry of*AB*and the*i*,*j*entry of*A’B*, proving the first equality. The second is similar. - The
*i*,*j*entry of \(l A\) is \(\lambda a_{ij}\), so the*i*,*j*entry of (*l**A*)*B*is \[ \sum_{k=1}^n(l a_{ik})b_{kj} = l \sum_{k=1}^n a_{ik}b_{kj}= l (AB)_{ij}\] so (*l**A*)*B*$ and*l*(*AB*) have the same*i*,*j*entry for any*i*,*j*, and are therefore equal. The second equality can be proved similarly. - This is clear from the definition of matrix multiplication.
- The
*i*,*j*entry of*AB*is \(\sum_{k=1}^n a_{ik}b_{kj}\), so the*i*,*j*entry of \((AB)^T\) is \(\sum_{k=1}^n a_{jk}b_{ki}\). If we write the*i*,*j*entry of \(B^T\) as \(\beta_{ij}\) and the*i*,*j*entry of \(A^T\) as \(\alpha_{ij}\) then the*i*,*j*entry of \(B^T A^T\) is \(\sum_{i=1}^n \beta_{ik}\alpha_{kj}\), but \(\beta_{ik}=b_{ki}\) and \(\alpha_{kj} = a_{jk}\) so this is \(\sum_{i=1}^n a_{jk}b_{ki}\) which is the same as the*i*,*j*entry of \((AB)^T\).

These properties show some of the ways in which matrix multiplication is like ordinary multiplication of numbers. There are two important ways in which it is different: in general, \(AB \neq BA\) \[\begin{align*} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} & = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix} \\ \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} &= \begin{pmatrix} 23 & 34 \\ 31 & 46 \end{pmatrix} \end{align*}\] and unlike for multiplying numbers, we can have \(AB=\mathbf{0}\) even when \(A,B\neq \mathbf{0}\): \[ \begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0&0\\0&0 \end{pmatrix}.\]

There is a matrix which behaves like 1 does under multiplication:

**Definition 2.6**The

*n*✕

*n*

**identity matrix**\(I_n\) is the matrix whose

*i*,

*j*entry is 1 if

*i*=

*j*and 0 otherwise.

a For example, \[ I_2= \begin{pmatrix} 1&0\\0&1\end{pmatrix},\;\; I_3 = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&1 \end{pmatrix}. \]

**Proposition 2.2**Let

*A*be

*m*✕

*n*. Then \(AI_n = A = I_m A\).

*Proof. * Let \(A=(a_{ij})\), and write \(I_n = (\delta_{ij})\) so that
\(\delta_{ij}=0\) if \(i\neq j\) and \(\delta_{ii}=1\) for each *i*. Using
the definition of matrix multiplication, the i, j entry of \(AI_n\)
is
\[
\sum_{k=1}^n a_{ik}\delta_{kj}.
\]
Because \(\delta_{kj}\) is zero unless *k*=*j*, the only term in this sum which might not be zero is the one when *k*=*j*. This
term is \(a_{ij}\times 1 = a_{ij}\), so the *i*, *j* entry of \(AI_n\) is
the same as the i, j entry of *A* and therefore \(AI_n=A\).

When we work in \(\rr^n\) or \(\mathbb{C}^n\), we use the notation \(\mathbf{e}_i\) to refer to the height *n* column vector with a 1 in position *i* and zeroes elsewhere. For example, if *n* was 3 then

\[\mathbf{e}_1 = \begin{pmatrix} 1\\0\\0\end{pmatrix}, \mathbf{e}_2 = \begin{pmatrix} 0\\1\\0\end{pmatrix}, \mathbf{e}_3 = \begin{pmatrix} 0\\0\\1\end{pmatrix}.\]

The vector \(\mathbf{e}_i\) is called the \(i\)th **standard basis vector** - you will learn what a basis is in the next section. These vectors have an important property with respect to matrix multiplication:

**Lemma 2.1**Let

*A*be an

*m*✕

*n*matrix. Then for any \(1\leq x \leq n\) the product \(A \mathbf{e}_x\) is equal to the

*x*th column of

*A*.

*Proof.*Let \(A = (a_{ij})\), so for any \(1\leq i \leq m\) the

*i*th entry of the height

*m*column vector \(A \mathbf{e}_x\) is \(\sum_{k=1}^n a_{ik}e_k\) where \(e_k\) is the

*k*th entry of \(\mathbf{e}_x\). Since \(e_k\) is zero unless

*k*=

*x*, in which case it is 1, the sum equals \(a_{ix}\). It follows \[A\mathbf{e}_x = \begin{pmatrix} a_{1x} \\ \vdots \\ a_{mx}\end{pmatrix}\] which is the

*x*th column of

*A*.

### 2.2.1 Invertibility

**Definition 2.7**An

*n*✕

*n*matrix

*A*is called

**invertible**if there is an

*n*✕

*n*matrix

*B*such that \(AB=I_n=BA\).

The next lemma shows that if *A* is invertible then there is one and only one matrix *B* such
that \(AB=I_n=BA\).

**Lemma 2.2**Suppose \(AB = BA = I_n\) and \(AC = CA = I_n\). Then

*B*=

*C*.

*Proof.*\[\begin{align*} B&= I_nB & \text{as } IX=X \text{ for any} X \\ &= (CA)B \\ &= C(AB) & \text{associativity} \\ &=CI_n \\ &= C. \end{align*}\]

When *A* is invertible we use the notation \(A^{-1}\) for the one and only matrix such that \(AA^{-1}=A^{-1}A=I_n\).

Not every non-zero square matrix is invertible: you can check directly that \(\begin{pmatrix} 0&1\\0&0 \end{pmatrix}\) and \(\begin{pmatrix}1&1\\1&1\end{pmatrix}\) aren’t invertible, for example. More generally, suppose that
*A* and *B* are any non-zero matrices such that \(AB=\mathbf{0}_{n\times n}\) (we have already seen examples of this). If *A* were invertible
we could multiply this equation on the left by the inverse of *A* to get
\[\begin{align*}
A^{-1} AB &= A^{-1} \mathbf{0}_{n\times n} \\
B &= \mathbf{0}_{n\times n}
\end{align*}\]
which is not the case. It follows that if there is a non-zero matrix
*B* such that \(AB=\mathbf{0}_{n\times n}\) then *A* isn’t invertible
(and a similar argument, multiplying on the right instead of the left, shows neither is *B*).

Here is another useful result that can tell you a matrix is not invertible:

**Lemma 2.3**Let

*A*be a matrix with a row of zeroes or a column of zeroes. Then

*A*is not invertible.

*Proof. * If *A* has a column of zeroes then, because matrix multiplication works columnwise, so does *BA* for any \(B\). So *BA* never equals the identity matrix, and thus *A* is not invertible.

*A*has a row of zeroes is similar.

### 2.2.2 Why define matrix multiplication this way?

The last thing in this section on matrix multiplication is a quick
comment on where it comes from.

Write \(\mathbb{R}^p\) for the set of all p✕1 column vectors with real numbers as entries.
If *A* is an *m*✕*n* matrix (with
real number entries, say) then there is a function
\[\begin{align*}
T_A: & \mathbb{R}^n \to \mathbb{R}^m \\
& T_A(\mathbf{v}) = A\mathbf{v}
\end{align*}\]

Now suppose *B* is *n*✕*p*, so that there is a similar map \(T_B : \mathbb{R}^p \to \mathbb{R}^n\). The composition \(T_A \circ T_B\) makes
sense as a map \(\mathbb{R}^p \to \mathbb{R}^m\), and it turns out
\[ T_A \circ T_B = T_{AB}.\]
The connection with composition of maps is why matrix multiplication
is defined the way it is.

We use a similar notation for complex column vectors: \(\mathbb{C}^p\) denotes the set of all height *p* column vectors with complex number entries.