We are going to define a way to multiply certain matrices together. After that we will see several different ways to understand this definition, and we will see how the definition arises as a kind of function composition.
Let be a matrix and be an matrix. Then the matrix product is defined to be the matrix whose entry is
(3.1) |
Before we even start thinking about this definition we record one key point about it. There are two s in the definition above: one is the number of columns of and the other is the number of rows of . These really must be the same. We only define the matrix product when the number of columns of equals the number of rows of . The reason for this will become clear when we interpret matrix multiplication in terms of function composition later.
The entry of a matrix product is obtained by putting and in the formula (3.1). If is and is then this is
You can see that we are multiplying each entry in the first row of by the corresponding entry in the second column of and adding up the results. In general, the entry of is obtained by multiplying the entries of row of with the entries of column of and adding them up.
Let’s look at a generic example first. Let
The number of columns of equals the number of rows of , so the matrix product is defined, and since (in the notation of the definition) , the size of is which is . From the formula, we get
Making the previous example concrete, if
then is , is , so the matrix product is defined and will be another matrix:
Matrix multiplication is so important that it is helpful to have several different ways of looking at it. The formula above is useful when we want to prove general properties of matrix multiplication, but we can get further insight when we examine the definition carefully from different points of view.
A very important special case of matrix multiplication is when we multiply a matrix by an column vector. Let
Then we have
Another way to write the result of this matrix multiplication is
showing that the result is obtained by adding up scalar multiples of the columns of . If we write for the th column of then the expression
where we add up scalar multiples of the s, is called a linear combination of , , and . Linear combinations are a fundamental idea and we will return to them again and again in the rest of MATH0005.
Let be matrices all of the same shape. A linear combination of is a matrix of the form
where the are numbers.
This result is true whenever we multiply an matrix and an column vector, not just in the example above.
Let be an matrix and an column vector with entries . If are the columns of then
From the matrix multiplication formula (3.1) we get
The column vector whose entries are , , … is exactly the th column of , so this completes the proof. ∎
For a fixed , the standard basis vectors are the vectors
The vector with a 1 in position and zeroes elsewhere is called the th standard basis vector.
For example, if then there are three standard basis vectors
The special case of the proposition above when we multiply a matrix by a standard basis vector is often useful, so we’ll record it here.
Let be a matrix and the th standard basis vector of height . Then is equal to the th column of .
According to Proposition 3.2.1 we have where is the th entry of and is the th column of . The entries of are all zero except for the th which is 1, so
Let . You should verify that equals the first column of and equals the second column of .
The next theorem tells us that we can do any matrix multiplication column-by-column, multiplying into each of the columns of in turn.
Let be an matrix and an matrix with columns . Then
The notation means that the first column of is equal to what you get by multiplying into the first column of , the second column of is what you get by multiplying into the second column of , and so on. That’s what it means to say that matrix multiplication works columnwise.
Every column of is a linear combination of the columns of .
There are analogous results when we multiply an row vector and an matrix.
Let be a row vector with entries and let be an matrix with rows . Then .
From the matrix multiplication formula (3.1) we get
In particular, is a linear combination of the rows of .
Let be a matrix with rows and let be an matrix. Then
The notation means that the first row of is equal to , the second row is equal to , and so on.
The theorem combined with Proposition 3.2.5 show that the rows of are linear combinations of the rows of .
Returning to the example where
the rows of are and and the rows of are and . We have
and these are the rows of the matrix product .
The matrix multiplication formula says that the entry of is
Let’s think about where the entries in this sum come from. The entries from involved are . These are exactly the entries in the th row of .
The entries from in this sum are . These are the entries from the th column of . So to get the entry of , we matrix multiply the th row of by the th column of . That is, if the th row of is and the th column of is , then
(3.4) |
(If you’re thinking ‘wait, isn’t a matrix, not a number?’ then you are correct. We will identify the matrix with the number .)
As an example, consider
Multiplying the second row of into the third column of gives
which is the entry of
When the result of a matrix multiplication is a matrix we will think of it as a number. This is like a dot product, if you’ve seen those before.
Let , a matrix, and , a column vector. The number of columns of and the number of rows of are equal, so we can compute .
Let
is , is , so the matrix product is defined, and is a matrix. The columns of are , , and . The product is therefore
Let
Then is , is , so the matrix product is defined and will be another matrix:
In this section we’ll try to answer two questions: where does this strange-looking notion of matrix multiplication come from? Why can we only multiply and if the number of columns of equals the number of rows of ?
Let be a matrix. Then is the function defined by
Notice that this definition really does make sense. If then it is an column vector, so the matrix product exists and has size , so it is an element of .
Now suppose we have an matrix and a matrix , so that and . Can we form the composition ? The answer is no, unless , that is, unless the number of columns of equals the number of rows of . So let’s assume that so that is and the composition
makes sense. What can we say about it?
If is and is then .
You will prove this on a problem sheet.
The theorem shows that matrix multiplication is related to composition of functions. That’s useful because it suggests something: we know that function composition is always associative, so can we use that to show matrix multiplication is associative too? That is, if the products and make sense, is equal to ? This is not exactly obvious if you just write down the horrible formulas for the , entries of both matrices. If we believe the theorem though it’s easy: we know
because function composition is associative, and so
If then (for example, you could evaluate at the standard basis vector to see that the th column of equals the th column of for any ), so we get .
Since we didn’t prove the theorem here, we’ll prove the associativity result in a more pedestrian way in the next section.