3 Matrices

3.2 Matrix multiplication

We are going to define a way to multiply certain matrices together. After that we will see several different ways to understand this definition, and we will see how the definition arises as a kind of function composition.

Definition 3.2.1.

Let A=(aij) be a m×n matrix and B=(bij) be an n×p matrix. Then the matrix product AB is defined to be the m×p matrix whose i,j entry is

k=1naikbkj. (3.1)

Before we even start thinking about this definition we record one key point about it. There are two ns in the definition above: one is the number of columns of A and the other is the number of rows of B. These really must be the same. We only define the matrix product AB when the number of columns of A equals the number of rows of B. The reason for this will become clear when we interpret matrix multiplication in terms of function composition later.

Example 3.2.1.

The 1,2 entry of a matrix product AB is obtained by putting i=1 and j=2 in the formula (3.1). If A=(aij) is m×n and B=(bij) is n×p then this is

a11b12+a12b22+a13b32++a1nbn2

You can see that we are multiplying each entry in the first row of A by the corresponding entry in the second column of B and adding up the results. In general, the i,j entry of AB is obtained by multiplying the entries of row i of A with the entries of column j of B and adding them up.

Example 3.2.2.

Let’s look at an abstract example first. Let

A=(a11a12a21a22),B=(b11b12b21b22).

The number of columns of A equals the number of rows of B, so the matrix product AB is defined, and since (in the notation of the definition) m=n=p=2, the size of AB is m×p which is 2×2. From the formula, we get

AB=(a11b11+a12b21a11b12+a12b22a21b11+a22b21a21b12+a22b22).
Example 3.2.3.

Making the previous example concrete, if

A=(1234),B=(5678).

then A is 2×2, B is 2×2, so the matrix product AB is defined and will be another 2×2 matrix:

AB =(1×5+2×71×6+2×83×5+4×73×6+4×8)
=(19224350).

Matrix multiplication is so important that it is helpful to have several different ways of looking at it. The formula above is useful when we want to prove general properties of matrix multiplication, but we can get further insight when we examine the definition carefully from different points of view.

3.2.1 Matrix multiplication happens columnwise

A very important special case of matrix multiplication is when we multiply a m×n matrix by an n×1 column vector. Let

A=(abcdef),𝐱=(xyz).

Then we have

A𝐱=(ax+by+czdx+ey+fz)

Another way to write the result of this matrix multiplication is

x(ad)+y(be)+z(cf)

showing that the result is obtained by adding up scalar multiples of the columns of A. If we write 𝐜j for the jth column of A then the expression

x𝐜1+y𝐜2+z𝐜3,

where we add up scalar multiples of the 𝐜js, is called a linear combination of 𝐜1, 𝐜2, and 𝐜3. Linear combinations are a fundamental idea and we will return to them again and again in the rest of MATH0005.

This result is true whenever we multiply an m×n matrix and an n×1 column vector, not just in the example above.

Proposition 3.2.1.

Let A=(aij) be an m×n matrix and 𝐱 an n×1 column vector with entries x1,,xn. If 𝐜1,,𝐜n are the columns of A then

A𝐱=k=1nxk𝐜k.
Proof.

From the matrix multiplication formula (3.1) we get

A𝐱=(k=1na1kxkk=1na2kxkk=1namkxk)=k=1nxk(a1ka2kamk)

The column vector whose entries are a1k, a2k, …amk is exactly the kth column of A, so this completes the proof. ∎

Definition 3.2.2.

For a fixed n, the standard basis vectors 𝐞1,,𝐞n are the vectors

(1000),(0100),,(0001).

The vector 𝐞i with a 1 in position i and zeroes elsewhere is called the ith standard basis vector.

For example, if n=3 then there are three standard basis vectors

𝐞1=(100),𝐞2=(010),𝐞3=(001).

The special case of the proposition above when we multiply a matrix by a standard basis vector is often useful, so we’ll record it here.

Corollary 3.2.2.

Let A be a m×n matrix and 𝐞j the jth standard basis vector of height n. Then A𝐞j is equal to the jth column of A.

Proof.

According to Proposition 3.2.1 we have A𝐞j=k=1nxk𝐜k where xk is the kth entry of 𝐞j and 𝐜k is the kth column of A. The entries of 𝐞j are all zero except for the jth which is 1, so

A𝐞j=0×𝐜1++1×𝐜j++0×𝐜n=𝐜j.
Example 3.2.4.

Let A=(1234). You should verify that A(10) equals the first column of A and A(01) equals the second column of A.

Proposition 3.2.1 is important it lets us show that when we do any matrix multiplication AB, we can do the multiplication column-by-column.

Theorem 3.2.3.

Let A be an m×n matrix and B an n×p matrix with columns 𝐝1,,𝐝p. Then

AB=(||A𝐝1A𝐝p||).

The notation means that the first column of AB is equal to what you get by multiplying A into the first column of B, the second column of AB is what you get by multiplying A into the second column of B, and so on. That’s what it means to say that matrix multiplication works columnwise.

Proof.

From the matrix multiplication formula (3.1) the jth column of AB has entries

(k=1na1kbkjk=1na2kbkjk=1namkbkj) (3.2)

The entries bkj for k=1,2,,n are exactly the entries in column j of B, so (3.2) is A𝐝j as claimed. ∎

Corollary 3.2.4.

Every column of AB is a linear combination of the columns of A.

Proof.

Theorem 3.2.3 tells us that each column of AB equals A𝐝 for certain vectors 𝐝, and Proposition 3.2.1 tells us that any such vector A𝐝 is a linear combination of the columns of A. ∎

Example 3.2.5.

Let’s look at how the Proposition and the Theorem in this section apply to Example 3.2.3, when A was (1234) and the columns of B are 𝐝1=(57) and 𝐝2=(68).

You can check that

A𝐝1 =(1943)
=5(13)+7(24)
A𝐝2 =(2250)
=6(13)+8(24)

and that these are the columns of AB we computed before.

3.2.2 Matrix multiplication happens rowwise

There are analogous results when we multiply an 1×n row vector and an n×p matrix.

Proposition 3.2.5.

Let 𝐚 be a 1×n row vector with entries a1,,an and let B be an n×p matrix with rows 𝐬1,,𝐬n. Then 𝐚B=k=1nak𝐬k.

Proof.

From the matrix multiplication formula (3.1) we get

𝐚B =(k=1nakbk1k=1nakbkp)
=k=1nak(bk1bkp)
=k=1nak𝐬k.

In particular, 𝐚B is a linear combination of the rows of B.

Theorem 3.2.6.

Let A be a m×n matrix with rows 𝐫1,,𝐫m and let B be an n×p matrix. Then

AB=(𝐫1B𝐫mB)

The notation is supposed to indicate that the first row of AB is equal to 𝐫1B, the second row is equal to 𝐫2B, and so on.

Proof.

From the matrix multiplication formula (3.1), the ith row of AB has entries

(k=1naikbk1k=1naikbkp) (3.4)
=k=1naik(bk1bkp). (3.6)

Row i of A is 𝐫i=(ai1ai2ain), so 𝐫iB agrees with (3.6) by Proposition 3.2.5. ∎

The theorem combined with the proposition before it show that in general the rows of AB are always linear combinations of the rows of B.

Example 3.2.6.

Returning to the example where

A=(1234),B=(5678)

the rows of A are 𝐫1=(12) and 𝐫2=(34) and the rows of B are 𝐬1=(56) and 𝐬2=(78). We have

𝐫1B =(12)(5678)
=𝐬1+2𝐬2
=(1922)
𝐫2B =(34)(5678)
=3𝐬1+4𝐬2
=(4350).

and these are the rows of the matrix product AB.

Example 3.2.7.

When the result of a matrix multiplication is a 1×1 matrix we will usually just think of it as a number. This is like a dot product, if you’ve seen those before.

(123)(456)=1×4+2×5+3×6=32.
Example 3.2.8.

Let A=(123456), a 3×2 matrix, and 𝐜=(78), a 2×1 column vector. The number of columns of A and the number of rows of 𝐜 are equal, so we can compute A𝐜.

A𝐜=(1×7+2×83×7+4×85×7+6×8).
Example 3.2.9.

Let

A=(12),B=(101010).

A is 1×2, B is 2×3, so the matrix product AB is defined, and is a 1×3 matrix. The columns of B are 𝐜1=(10), 𝐜2=(01), and 𝐜3=(10). The product AB is therefore

(A𝐜1A𝐜2A𝐜3) =(1×1+2×01×0+2×11×1+2×0)
=(121)
Example 3.2.10.

Let

A=(1234),B=(5678).

Then A is 2×2, B is 2×2, so the matrix product AB is defined and will be another 2×2 matrix:

AB=(1×5+2×71×6+2×83×5+4×73×6+4×8).

3.2.3 Matrix multiplication motivation

In this section we’ll try to answer two questions: where does this strange-looking notion of matrix multiplication come from? Why can we only multiply A and B if the number of columns of A equals the number of rows of B?

Definition 3.2.3.

Let A be a m×n matrix. Then TA:nm is the function defined by

TA(𝐱)=A𝐱.

Notice that this definition really does make sense. If 𝐱n then it is an n×1 column vector, so the matrix product A𝐱 exists and has size m×1, so it is an element of m.

Now suppose we have an m×n matrix A and a q×p matrix B, so that TA:nm and TB:pq. Can we form the composition TATB? The answer is no, unless q=n, that is, unless the number of columns of A equals the number of rows of B. So let’s assume that q=n so that B is n×p and the composition

TATB:np

makes sense. What can we say about it?

Theorem 3.2.7.

If A is m×n and B is n×p then TATB=TAB.

You will prove this on a problem sheet.

The theorem shows that matrix multiplication is related to composition of functions. That’s useful because it suggests something: we know that function composition is always associative, so can we use that to show matrix multiplication is associative too? That is, if the products AB and BC make sense, is A(BC) equal to (AB)C? This is not exactly obvious if you just write down the horrible formulas for the i, j entries of both matrices. If we believe the theorem though it’s easy: we know

TA(TBTC)=(TATB)TC

because function composition is associative, and so

TATBC =TABTC
TA(BC) =T(AB)C.

If TX=TY then X=Y (for example, you could evaluate at the standard basis vector 𝐞j to see that the jth column of X equals the jth column of Y for any j), so we get A(BC)=(AB)C.

Since we didn’t prove the theorem here, we’ll prove the associativity result in a more pedestrian way in the next section.