3 Matrices 3.1 Matrix definitions 3.3 Transpose

3.2 Matrix multiplication

We are going to define a way to multiply certain matrices together. After that we will see several different ways to understand this definition, and we will see how the definition arises as a kind of function composition.

Definition 3.2.1.

Let $A=(a_{ij})$ be a $m\times n$ matrix and $B=(b_{ij})$ be an $n\times p$ matrix. Then the matrix product $A B$ is defined to be the $m\times p$ matrix whose $i, j$ entry is

\sum_{k=1}^{n}a_{ik}b_{kj}.

(3.1)

Before we even start thinking about this definition we record one key point about it. There are two $n$ s in the definition above: one is the number of columns of $A$ and the other is the number of rows of $B$ . These really must be the same. We only define the matrix product $A B$ when the number of columns of $A$ equals the number of rows of $B$ . The reason for this will become clear when we interpret matrix multiplication in terms of function composition later.

Example 3.2.1.

The $1,2$ entry of a matrix product $A B$ is obtained by putting $i=1$ and $j=2$ in the formula (3.1). If $A=(a_{ij})$ is $m\times n$ and $B=(b_{ij})$ is $n\times p$ then this is

a_{11}b_{12}+a_{12}b_{22}+a_{13}b_{32}+\cdots+a_{1n}b_{n2}

You can see that we are multiplying each entry in the first row of $A$ by the corresponding entry in the second column of $B$ and adding up the results. In general, the $i, j$ entry of $A B$ is obtained by multiplying the entries of row $i$ of $A$ with the entries of column $j$ of $B$ and adding them up.

Example 3.2.2.

Let’s look at an abstract example first. Let

A=\begin{pmatrix}a_{11}&a_{12}\\ a_{21}&a_{22}\end{pmatrix},B=\begin{pmatrix}b_{11}&b_{12}\\ b_{21}&b_{22}\end{pmatrix}.

The number of columns of $A$ equals the number of rows of $B$ , so the matrix product $A B$ is defined, and since (in the notation of the definition) $m=n=p=2$ , the size of $A B$ is $m\times p$ which is $2\times 2$ . From the formula, we get

AB=\begin{pmatrix}a_{11}b_{11}+a_{12}b_{21}&a_{11}b_{12}+a_{12}b_{22}\\ a_{21}b_{11}+a_{22}b_{21}&a_{21}b_{12}+a_{22}b_{22}\end{pmatrix}.

Example 3.2.3.

Making the previous example concrete, if

A=\begin{pmatrix}1&2\\ 3&4\end{pmatrix},B=\begin{pmatrix}5&6\\ 7&8\end{pmatrix}.

then $A$ is $2\times 2$ , $B$ is $2\times 2$ , so the matrix product $A B$ is defined and will be another $2\times 2$ matrix:

	$\displaystyle AB$	$\displaystyle=\begin{pmatrix}1\times 5+2\times 7&1\times 6+2\times 8\\ 3\times 5+4\times 7&3\times 6+4\times 8\end{pmatrix}$
		$\displaystyle=\begin{pmatrix}19&22\\ 43&50\end{pmatrix}.$

Matrix multiplication is so important that it is helpful to have several different ways of looking at it. The formula above is useful when we want to prove general properties of matrix multiplication, but we can get further insight when we examine the definition carefully from different points of view.

3.2.1 Matrix multiplication happens columnwise

A very important special case of matrix multiplication is when we multiply a $m\times n$ matrix by an $n\times 1$ column vector. Let

A=\begin{pmatrix}a&b&c\\ d&e&f\end{pmatrix},\mathbf{x}=\begin{pmatrix}x\\ y\\ z\end{pmatrix}.

Then we have

A\mathbf{x}=\begin{pmatrix}ax+by+cz\\ dx+ey+fz\end{pmatrix}

Another way to write the result of this matrix multiplication is

x\begin{pmatrix}a\\ d\end{pmatrix}+y\begin{pmatrix}b\\ e\end{pmatrix}+z\begin{pmatrix}c\\ f\end{pmatrix}

showing that the result is obtained by adding up scalar multiples of the columns of $A$ . If we write $\mathbf{c}_{j}$ for the $j$ th column of $A$ then the expression

x\mathbf{c}_{1}+y\mathbf{c}_{2}+z\mathbf{c}_{3},

where we add up scalar multiples of the $\mathbf{c}_{j}$ s, is called a linear combination of $\mathbf{c}_{1}$ , $\mathbf{c}_{2}$ , and $\mathbf{c}_{3}$ . Linear combinations are a fundamental idea and we will return to them again and again in the rest of MATH0005.

This result is true whenever we multiply an $m\times n$ matrix and an $n\times 1$ column vector, not just in the example above.

Proposition 3.2.1.

Let $A=(a_{ij})$ be an $m\times n$ matrix and $\mathbf{x}$ an $n\times 1$ column vector with entries $x_{1},\ldots,x_{n}$ . If $\mathbf{c}_{1},\ldots,\mathbf{c}_{n}$ are the columns of $A$ then

A\mathbf{x}=\sum_{k=1}^{n}x_{k}\mathbf{c}_{k}.

Proof.

From the matrix multiplication formula (3.1) we get

A\mathbf{x}=\begin{pmatrix}\sum_{k=1}^{n}a_{1k}x_{k}\\ \sum_{k=1}^{n}a_{2k}x_{k}\\ \vdots\\ \sum_{k=1}^{n}a_{mk}x_{k}\end{pmatrix}=\sum_{k=1}^{n}x_{k}\begin{pmatrix}a_{1k% }\\ a_{2k}\\ \vdots\\ a_{mk}\end{pmatrix}

The column vector whose entries are $a_{1k}$ , $a_{2k}$ , … $a_{mk}$ is exactly the $k$ th column of $A$ , so this completes the proof. ∎

Definition 3.2.2.

For a fixed $n$ , the standard basis vectors $\mathbf{e}_{1},\ldots,\mathbf{e}_{n}$ are the vectors

\begin{pmatrix}1\\ 0\\ 0\\ \vdots\\ 0\end{pmatrix},\begin{pmatrix}0\\ 1\\ 0\\ \vdots\\ 0\end{pmatrix},\ldots,\begin{pmatrix}0\\ 0\\ \vdots\\ 0\\ 1\end{pmatrix}.

The vector $\mathbf{e}_{i}$ with a 1 in position $i$ and zeroes elsewhere is called the $i$ th standard basis vector.

For example, if $n=3$ then there are three standard basis vectors

\mathbf{e}_{1}=\begin{pmatrix}1\\ 0\\ 0\end{pmatrix},\mathbf{e}_{2}=\begin{pmatrix}0\\ 1\\ 0\end{pmatrix},\mathbf{e}_{3}=\begin{pmatrix}0\\ 0\\ 1\end{pmatrix}.

The special case of the proposition above when we multiply a matrix by a standard basis vector is often useful, so we’ll record it here.

Corollary 3.2.2.

Let $A$ be a $m\times n$ matrix and $\mathbf{e}_{j}$ the $j$ th standard basis vector of height $n$ . Then $A\mathbf{e}_{j}$ is equal to the $j$ th column of $A$ .

Proof.

According to Proposition 3.2.1 we have $A\mathbf{e}_{j}=\sum_{k=1}^{n}x_{k}\mathbf{c}_{k}$ where $x_{k}$ is the $k$ th entry of $\mathbf{e}_{j}$ and $\mathbf{c}_{k}$ is the $k$ th column of $A$ . The entries of $\mathbf{e}_{j}$ are all zero except for the $j$ th which is 1, so

A\mathbf{e}_{j}=0\times\mathbf{c}_{1}+\cdots+1\times\mathbf{c}_{j}+\cdots+0% \times\mathbf{c}_{n}=\mathbf{c}_{j}.\qed

Example 3.2.4.

Let $A=\begin{pmatrix}1&2\\ 3&4\end{pmatrix}$ . You should verify that $A\begin{pmatrix}1\\ 0\end{pmatrix}$ equals the first column of $A$ and $A\begin{pmatrix}0\\ 1\end{pmatrix}$ equals the second column of $A$ .

Proposition 3.2.1 is important it lets us show that when we do any matrix multiplication $A B$ , we can do the multiplication column-by-column.

Theorem 3.2.3.

Let $A$ be an $m\times n$ matrix and $B$ an $n\times p$ matrix with columns $\mathbf{d}_{1},\ldots,\mathbf{d}_{p}$ . Then

AB=\begin{pmatrix}|&\cdots&|\\ A\mathbf{d}_{1}&\cdots&A\mathbf{d}_{p}\\ |&\cdots&|\end{pmatrix}.

The notation means that the first column of $A B$ is equal to what you get by multiplying $A$ into the first column of $B$ , the second column of $A B$ is what you get by multiplying $A$ into the second column of $B$ , and so on. That’s what it means to say that matrix multiplication works columnwise.

Proof.

From the matrix multiplication formula (3.1) the $j$ th column of $A B$ has entries

\begin{pmatrix}\sum_{k=1}^{n}a_{1k}b_{kj}\\ \sum_{k=1}^{n}a_{2k}b_{kj}\\ \vdots\\ \sum_{k=1}^{n}a_{mk}b_{kj}\end{pmatrix}

(3.2)

The entries $b_{kj}$ for $k=1,2,\ldots,n$ are exactly the entries in column $j$ of $B$ , so (3.2) is $A\mathbf{d}_{j}$ as claimed. ∎

Corollary 3.2.4.

Every column of $A B$ is a linear combination of the columns of $A$ .

Proof.

Theorem 3.2.3 tells us that each column of $A B$ equals $A\mathbf{d}$ for certain vectors $\mathbf{d}$ , and Proposition 3.2.1 tells us that any such vector $A\mathbf{d}$ is a linear combination of the columns of $A$ . ∎

Example 3.2.5.

Let’s look at how the Proposition and the Theorem in this section apply to Example 3.2.3, when $A$ was $\begin{pmatrix}1&2\\ 3&4\end{pmatrix}$ and the columns of $B$ are $\mathbf{d}_{1}=\begin{pmatrix}5\\ 7\end{pmatrix}$ and $\mathbf{d}_{2}=\begin{pmatrix}6\\ 8\end{pmatrix}$ .

You can check that

	$\displaystyle A\mathbf{d}_{1}$	$\displaystyle=\begin{pmatrix}19\\ 43\end{pmatrix}$
		$\displaystyle=5\begin{pmatrix}1\\ 3\end{pmatrix}+7\begin{pmatrix}2\\ 4\end{pmatrix}$
	$\displaystyle A\mathbf{d}_{2}$	$\displaystyle=\begin{pmatrix}22\\ 50\end{pmatrix}$
		$\displaystyle=6\begin{pmatrix}1\\ 3\end{pmatrix}+8\begin{pmatrix}2\\ 4\end{pmatrix}$

and that these are the columns of $A B$ we computed before.

3.2.2 Matrix multiplication happens rowwise

There are analogous results when we multiply an $1\times n$ row vector and an $n\times p$ matrix.

Proposition 3.2.5.

Let $\mathbf{a}$ be a $1\times n$ row vector with entries $a_{1},\ldots,a_{n}$ and let $B$ be an $n\times p$ matrix with rows $\mathbf{s}_{1},\ldots,\mathbf{s}_{n}$ . Then $\mathbf{a}B=\sum_{k=1}^{n}a_{k}\mathbf{s}_{k}$ .

Proof.

From the matrix multiplication formula (3.1) we get

	$\displaystyle\mathbf{a}B$	$\displaystyle=\begin{pmatrix}\sum_{k=1}^{n}a_{k}b_{k1}&\cdots&\sum_{k=1}^{n}a_% {k}b_{kp}\end{pmatrix}$
		$\displaystyle=\sum_{k=1}^{n}a_{k}\begin{pmatrix}b_{k1}&\cdots&b_{kp}\end{pmatrix}$
		$\displaystyle=\sum_{k=1}^{n}a_{k}\mathbf{s}_{k}.\qed$

In particular, $\mathbf{a}B$ is a linear combination of the rows of $B$ .

Theorem 3.2.6.

Let $A$ be a $m\times n$ matrix with rows $\mathbf{r}_{1},\ldots,\mathbf{r}_{m}$ and let $B$ be an $n\times p$ matrix. Then

AB=\begin{pmatrix}\mbox{---}&\mathbf{r}_{1}B&\mbox{---}\\ \cdots&\cdots&\cdots\\ \mbox{---}&\mathbf{r}_{m}B&\mbox{---}\end{pmatrix}

The notation is supposed to indicate that the first row of $A B$ is equal to $\mathbf{r}_{1}B$ , the second row is equal to $\mathbf{r}_{2}B$ , and so on.

Proof.

From the matrix multiplication formula (3.1), the $i$ th row of $A B$ has entries

	$\displaystyle\begin{pmatrix}\sum_{k=1}^{n}a_{ik}b_{k1}&\cdots&\sum_{k=1}^{n}a_% {ik}b_{kp}\end{pmatrix}$		(3.4)
	$\displaystyle=\sum_{k=1}^{n}a_{ik}\begin{pmatrix}b_{k1}&\cdots&b_{kp}\end{% pmatrix}.$		(3.6)

Row $i$ of $A$ is $\mathbf{r}_{i}=\begin{pmatrix}a_{i1}&a_{i2}&\cdots&a_{in}\end{pmatrix}$ , so $\mathbf{r}_{i}B$ agrees with (3.6) by Proposition 3.2.5. ∎

The theorem combined with the proposition before it show that in general the rows of $A B$ are always linear combinations of the rows of $B$ .

Example 3.2.6.

Returning to the example where

A=\begin{pmatrix}1&2\\ 3&4\end{pmatrix},B=\begin{pmatrix}5&6\\ 7&8\end{pmatrix}

the rows of $A$ are $\mathbf{r}_{1}=\begin{pmatrix}1&2\end{pmatrix}$ and $\mathbf{r}_{2}=\begin{pmatrix}3&4\end{pmatrix}$ and the rows of $B$ are $\mathbf{s}_{1}=\begin{pmatrix}5&6\end{pmatrix}$ and $\mathbf{s}_{2}=\begin{pmatrix}7&8\end{pmatrix}$ . We have

	$\displaystyle\mathbf{r}_{1}B$	$\displaystyle=\begin{pmatrix}1&2\end{pmatrix}\begin{pmatrix}5&6\\ 7&8\end{pmatrix}$
		$\displaystyle=\mathbf{s}_{1}+2\mathbf{s}_{2}$
		$\displaystyle=\begin{pmatrix}19&22\end{pmatrix}$
	$\displaystyle\mathbf{r}_{2}B$	$\displaystyle=\begin{pmatrix}3&4\end{pmatrix}\begin{pmatrix}5&6\\ 7&8\end{pmatrix}$
		$\displaystyle=3\mathbf{s}_{1}+4\mathbf{s}_{2}$
	$\displaystyle=\begin{pmatrix}43&50\end{pmatrix}.$

and these are the rows of the matrix product $A B$ .

Example 3.2.7.

When the result of a matrix multiplication is a $1\times 1$ matrix we will usually just think of it as a number. This is like a dot product, if you’ve seen those before.

\begin{pmatrix}1&2&3\end{pmatrix}\begin{pmatrix}4\\ 5\\ 6\end{pmatrix}=1\times 4+2\times 5+3\times 6=32.

Example 3.2.8.

Let $A=\begin{pmatrix}1&2\\ 3&4\\ 5&6\end{pmatrix}$ , a $3\times 2$ matrix, and $\mathbf{c}=\begin{pmatrix}7\\ 8\end{pmatrix}$ , a $2\times 1$ column vector. The number of columns of $A$ and the number of rows of $\mathbf{c}$ are equal, so we can compute $A\mathbf{c}$ .

A\mathbf{c}=\begin{pmatrix}1\times 7+2\times 8\\ 3\times 7+4\times 8\\ 5\times 7+6\times 8\end{pmatrix}.

Example 3.2.9.

Let

A=\begin{pmatrix}1&2\end{pmatrix},B=\begin{pmatrix}1&0&1\\ 0&1&0\end{pmatrix}.

$A$ is $1\times 2$ , $B$ is $2\times 3$ , so the matrix product $A B$ is defined, and is a $1\times 3$ matrix. The columns of $B$ are $\mathbf{c}_{1}=\begin{pmatrix}1\\ 0\end{pmatrix}$ , $\mathbf{c}_{2}=\begin{pmatrix}0\\ 1\end{pmatrix}$ , and $\mathbf{c}_{3}=\begin{pmatrix}1\\ 0\end{pmatrix}$ . The product $A B$ is therefore

	$\displaystyle\begin{pmatrix}A\mathbf{c}_{1}&A\mathbf{c}_{2}&A\mathbf{c}_{3}% \end{pmatrix}$	$\displaystyle=\begin{pmatrix}1\times 1+2\times 0&1\times 0+2\times 1&1\times 1% +2\times 0\end{pmatrix}$
		$\displaystyle=\begin{pmatrix}1&2&1\end{pmatrix}$

Example 3.2.10.

Let

A=\begin{pmatrix}1&2\\ 3&4\end{pmatrix},B=\begin{pmatrix}5&6\\ 7&8\end{pmatrix}.

Then $A$ is $2\times 2$ , $B$ is $2\times 2$ , so the matrix product $A B$ is defined and will be another $2\times 2$ matrix:

AB=\begin{pmatrix}1\times 5+2\times 7&1\times 6+2\times 8\\ 3\times 5+4\times 7&3\times 6+4\times 8\end{pmatrix}.

3.2.3 Matrix multiplication motivation

In this section we’ll try to answer two questions: where does this strange-looking notion of matrix multiplication come from? Why can we only multiply $A$ and $B$ if the number of columns of $A$ equals the number of rows of $B$ ?

Definition 3.2.3.

Let $A$ be a $m\times n$ matrix. Then $T_{A}:\mathbb{R}^{n}\to\mathbb{R}^{m}$ is the function defined by

T_{A}(\mathbf{x})=A\mathbf{x}.

Notice that this definition really does make sense. If $\mathbf{x}\in\mathbb{R}^{n}$ then it is an $n\times 1$ column vector, so the matrix product $A\mathbf{x}$ exists and has size $m\times 1$ , so it is an element of $\mathbb{R}^{m}$ .

Now suppose we have an $m\times n$ matrix $A$ and a $q\times p$ matrix $B$ , so that $T_{A}:\mathbb{R}^{n}\to\mathbb{R}^{m}$ and $T_{B}:\mathbb{R}^{p}\to\mathbb{R}^{q}$ . Can we form the composition $T_{A}\circ T_{B}$ ? The answer is no, unless $q=n$ , that is, unless the number of columns of $A$ equals the number of rows of $B$ . So let’s assume that $q=n$ so that $B$ is $n\times p$ and the composition

T_{A}\circ T_{B}:\mathbb{R}^{n}\to\mathbb{R}^{p}

makes sense. What can we say about it?

Theorem 3.2.7.

If $A$ is $m\times n$ and $B$ is $n\times p$ then $T_{A}\circ T_{B}=T_{AB}$ .

You will prove this on a problem sheet.

The theorem shows that matrix multiplication is related to composition of functions. That’s useful because it suggests something: we know that function composition is always associative, so can we use that to show matrix multiplication is associative too? That is, if the products $A B$ and $B C$ make sense, is $A(BC)$ equal to $(AB)C$ ? This is not exactly obvious if you just write down the horrible formulas for the $i$ , $j$ entries of both matrices. If we believe the theorem though it’s easy: we know

T_{A}\circ(T_{B}\circ T_{C})=(T_{A}\circ T_{B})\circ T_{C}

because function composition is associative, and so

	$\displaystyle T_{A}\circ T_{BC}$	$\displaystyle=T_{AB}\circ T_{C}$
	$\displaystyle T_{A(BC)}$	$\displaystyle=T_{(AB)C}.$

If $T_{X}=T_{Y}$ then $X=Y$ (for example, you could evaluate at the standard basis vector $\mathbf{e}_{j}$ to see that the $j$ th column of $X$ equals the $j$ th column of $Y$ for any $j$ ), so we get $A(BC)=(AB)C$ .

Since we didn’t prove the theorem here, we’ll prove the associativity result in a more pedestrian way in the next section.