We begin with a lot of definitions.

- A \(m\times n\) matrix is a rectangular grid of numbers with \(m\) rows, \(n\) columns
- A square matrix is one which is \(n\times n\) for some \(n\).
- A (height \(m\)) column vector is an \(m\times 1\) matrix
- A (width \(n\)) row vector is a \(1\times n\) matrix
- \(\mathbb{R}^n\) is the set of all column vectors with height \(n\) and real numbers as entries, \(\mathbb{C}^n\) is the set of all height \(n\) column vectors with complex numbers as entries.
- \(M_{m\times n}( \mathbb{R} )\) is the set of all \(m\times n\) matrices with real number entries.

**Examples.**

- \(\begin{pmatrix} 1\\0 \end{pmatrix}\), a \(2\times 1\) column vector, an element of \(\mathbb{R}^2\).
- \(\begin{pmatrix} 1&2&3\\4&5&6 \end{pmatrix}\), a \(2\times 3\) matrix
- \(\begin{pmatrix} -1&-2 \end{pmatrix}\), a \(1\times 2\) row vector
- \(\begin{pmatrix} 1&2\\2&1 \end{pmatrix}\), a \(2\times 2\) square matrix.

The \(i\), \(j\) entry of a matrix means the number in row \(i\), column \(j\). It is important to get these the correct way round. Usually when you give \((x, y)\) coordinates, \(x\) refers to the horizontal direction and \(y\) refers to the vertical direction. When we talk about the \(i,j\) entry of a matrix, however, the first number \(i\) refers to the row number (i.e. the vertical direction) and the second number \(j\) refers to the column number (i.e. the horizontal direction).

We want to refer to these entries often. We often write \(A = (a_{i,j})\) to mean that \(A\) is the matrix whose \(i\),\(j\) entry is called \(a_{ij}\).

For example, in the \(2\times 2\) case we would have

\[A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}.\]

If you’re using this notation you must also specify the size of the matrix, of course.

We can add matrices of the same size

If \(A=(a_{ij})\) and \(B=(b_{ij})\) are the same size, then \(A+B\) is defined to be the matrix whose \(i,j\) entry is \(a_{ij} + b_{ij}\).

**Example.** \[\begin{pmatrix}
1&2 \\ 4&5
\end{pmatrix} + \begin{pmatrix}
0&1\\2&3
\end{pmatrix} = \begin{pmatrix}
1+0 & 2+1 \\
4+2 & 5 + 3
\end{pmatrix} = \begin{pmatrix}
1&3\\6&8
\end{pmatrix}.\]

In other words, we add matrices by adding corresponding entries. We never add matrices of different sizes.

We also multiply matrices by numbers - this is called “scalar multiplication”.

If \(A = (a_{ij})\) is a matrix and \(\lambda\) a number then \(\lambda A\) means the matrix whose \(i,j\) entry is \(\lambda a_{ij}\).

**Example.** \[2 \begin{pmatrix}
1&0\\0&1
\end{pmatrix} = \begin{pmatrix}
2&0\\0&2
\end{pmatrix}.\]

These operations have some familiar properties:

**Proposition:** if \(a,b\) are scalars and \(A\), \(B\) are same-size matrices,

- \((a+b)A = aA + bA\) (distributivity)
- \(a(A+B) = aA + aB\) (distributivity)
- \(a(bA) = (ab)A\) (associativity)

There’s some notation for the matrix all of whose entries are zero. \(0_{m,n}\) or \(0_{m\times n}\) is the \(m\times n\) matrix with all entries zero. This is called the \(m \times n\) **zero matrix**.

We’re going to define a way to multiply some - not all - matrices together, and justify the definition as coming from function composition. The definition will be built up in pieces: first row times column, then matrix times column, then matrix times matrix.

Let \(\mathbf{r} = (r_1,\ldots,r_n)\) be a row vector of width \(n\) and

\[\mathbf{c} = \begin{pmatrix}c_1 \\ \vdots \\c_n\end{pmatrix}\]

be a column vector of the height \(n\). We define \(\mathbf{r}\mathbf{c}\) to be the number \(\sum_{i=1}^n r_i c_i\).

It is important to note that we are only defining a row vector times a column vector here. The order is important. \(\mathbf{c}\mathbf{r}\) is not defined.

**Example.** \[
\begin{pmatrix}
1&2&3
\end{pmatrix} \begin{pmatrix}
4\\5\\6
\end{pmatrix} = 1\times 4 + 2 \times 5 + 3 \times 6 = 32.
\]

This is like a dot product, if you’ve seen that before.

Let \(A=(a_{ij})\) be an \(m \times n\) matrix with rows \(\mathbf{r}_1,\ldots, \mathbf{r}_m\). This means \(\mathbf{r}_i\) is the row vector \((a_{i1},\ldots,a_{in})\)

Let \(\mathbf{c}\) be a column vector of height \(n\). Then \(A\mathbf{c}\) is defined to be the height \(m\) column vector \(\begin{pmatrix}\mathbf{r}_1\mathbf{c} \\ \vdots \\ \mathbf{r}_m \mathbf{c} \end{pmatrix}\).

The important thing about this definition is that the number of columns of \(A\), which is \(n\), must be the same as the number of rows of \(\mathbf{c}\) for \(Ac\) to be defined.

**Example.** Let \(A = \begin{pmatrix} 1&2\\3&4 \end{pmatrix}\). Then the rows of \(A\) are \(\mathbf{r}_1 = \begin{pmatrix} 1&2 \end{pmatrix}\) and \(\mathbf{r}_2 = \begin{pmatrix} 3 & 4 \end{pmatrix}\). If \(\mathbf{c} = \begin{pmatrix} 5\\6 \end{pmatrix}\) then we can compute \(A\mathbf{c}\) since the number of columns of \(A\) (which is 2) equals the number of rows of \(\mathbf{c}\):

\[A\mathbf{c} = \begin{pmatrix} r_1 \mathbf{c} \\ r_2 \mathbf{c} \end{pmatrix} = \begin{pmatrix} 1\times 5 + 2 \times 6 \\ 3 \times 5 + 4 \times 6 \end{pmatrix}\].

**Example.** Let \(A = \begin{pmatrix} 1&2\\3&4\\5&6 \end{pmatrix}\), a \(3\times 2\) matrix, and \(\mathbf{c} = \begin{pmatrix} 7 \\ 8 \end{pmatrix}\), a \(2 \times 1\) column vector. The number of columns of \(A\) and the number of rows of \(\mathbf{c}\) are equal (to 2), so we can compute \(A\mathbf{c}\):

\[A\mathbf{c} = \begin{pmatrix} 1\times 7 + 2\times 8\\ 3 \times 7 + 4 \times 8\\ 5 \times 7 + 6 \times 8 \end{pmatrix}.\]

Let \(A\) be \(m \times n\) and \(B\) be \(n\times p\). Let the columns of \(B\) be \(\mathbf{c}_1,\ldots,\mathbf{c}_p\). Then \(AB\) is the \(m\times p\) matrix with columns \(A\mathbf{c}_1,\ldots, A\mathbf{c}_p\).

Again, it is important to note that \(AB\) is defined if *and only if* the number of columns of \(A\) equals the number of rows of \(B\).

What this definition says is that

\[A(\mathbf{c}_1\; \ldots \; \mathbf{c}_p) = (A\mathbf{c}_1\; \ldots \;A\mathbf{c}_p)\]

which is what we mean when we say matrix multiplication works *columnwise* or *column-by-column*.

**Example.** Let

\[A = \begin{pmatrix} 1&2 \end{pmatrix}, B= \begin{pmatrix} 1&0&1\\ 0&1&0 \end{pmatrix}.\]

\(A\) is \(1\times 2\), \(B\) is \(2\times 3\), so the matrix product \(AB\) is defined, and is a \(1\times 3\) matrix.

The columns of \(B\) are \(\mathbf{c}_1 = \begin{pmatrix} 1\\0 \end{pmatrix}\), \(\mathbf{c}_2 = \begin{pmatrix} 0\\1 \end{pmatrix}\), and \(\mathbf{c}_3 = \begin{pmatrix} 1&0 \end{pmatrix}\). The product \(AB\) is therefore

\[\begin{pmatrix} A\mathbf{c}_1 & A\mathbf{c}_2 & A\mathbf{c}_3 \end{pmatrix} = \begin{pmatrix} 1\times 1 + 2 \times 2 & 1 \times 0 + 2 \times 1 & 1 \times 1 + 2 \times 0 \end{pmatrix}\]

**Example.** Let

\[A = \begin{pmatrix} 1&2\\ 3&4 \end{pmatrix}, B = \begin{pmatrix} 5&6\\ 7&8 \end{pmatrix}.\]

Then \(A\) is \(2\times 2\), \(B\) is \(2\times 2\), so the matrix product \(AB\) is defined and will be another \(2\times 2\) matrix:

\[AB = \begin{pmatrix} 1\times 5 + 2 \times 7 & 1 \times 6 + 2 \times 8\\ 3\times 5 + 4 \times 7 & 3 \times 6 + 4 \times 8 \end{pmatrix}.\]

There is an equivalent definition of matrix multiplication which is more useful when we are trying to prove facts about it. if \(A=(a_{ij})\) is \(m\times n\) and \(B =(b_{ij})\) is \(n\times p\) then \(AB\) is the \(m\times p\) matrix whose \(i,j\) entry is

\[\sum_{k=1}^n a_{ik}b_{kj}.\]

Summary of the previous matrix multiplication video:

- we define the product \(AB\) of two matrices \(A=(a_{ij})\) and \(B=(b_{ij})\) if and only if the number of columns of \(A\) equals the number of rows of \(B\)
- in this case if \(A\) is \(m \times n\) and \(B\) is \(n \times p\) then \(AB\) will be \(m\) by \(p\)
- there are 3 equivalent ways to compute \(AB\):
- The formula: \(AB = (c_{ij})\), where \(c_{ij} = \sum_{k=1}^n a_{ik}b_{kj}\)
- Columnwise: \(AB = (Ac_1\; \cdots \; A c_p)\) where \(c_j\) is the \(i\)th column of \(B\), and \(Ac_j\) is the matrix-column multiplication we defined in the last video.
- Rows dotted with columns: the \(i,j\) entry of \(AB\) is the \(i\)th row of \(A\) times the \(j\)th column of \(B\)

**Proposition.** Let \(A\), \(A'\) be \(m \times n\), let \(B, B'\) be \(n \times p\), let \(C\) be \(p \times q\), and let \(l\) be a number. Then

- \(A(BC) = (AB)C\) (associativity)
- \((A+A')B = AB + A'B\), and \(A(B+B') = AB + AB'\) (distributivity)
- \((lA)B = l(AB) = A(lB)\)

*Proof.* Let \(A=(a_{ij}), A'=(a'_{ij}), B=(b_{ij}), B'=(b'_{ij}), C=(c_{ij})\). During this proof we also write \(X_{ij}\) to mean the \(i, j\) entry of a matrix \(X\).

- \(AB\) has \(i,j\) entry \(\sum_{k=1}^n a_{ik}b_{kj}\), so the \(i, j\) entry of \((AB)C\) is \[ \sum_{l=1}^p (AB)_{il} c_{lj} = \sum_{l=1}^p \sum_{k=1}^n a_{ik}b_{kl}c_{lj}.\] On the other hand, the \(i, j\) entry of \(BC\) is \(\sum_{l=1}^pb_{il}c_{lj}\) so the \(i, j\) entry of \(A(BC)\) is \[\begin{align*}\sum_{k=1}^n a_{ik}(BC)_{kj} &= \sum_{k=1}^n a_{ik} \sum_{l=1}^p b_{kl}c_{lj}\\ &= \sum_{k=1}^n \sum_{l=1}^p a_{ik}b_{kl}c_{lj}. \end{align*}\] These are the same, because it doesn’t matter if we do the \(k\) or \(l\) summation first: we just get the same terms in a different order.
- The \(i, j\) entry of \((A+A')B\) is \(\sum_{k=1}^n (a_{ik} + a'_{ik})b_{jk}\) which equals \(\sum_{k=1}^n a_{ik}b_{kj} + \sum_{k=1}^n a'_{ik}b_{kj}\), but this is the sum of the \(i, j\) entry of \(AB\) and the \(i, j\) entry of \(A'B\), proving the first equality. The second is similar.
- The \(i,j\) entry of \(\lambda A\) is \(\lambda a_{ij}\), so the \(i, j\) entry of \((\lambda A)B\) is \[ \sum_{k=1}^n(\lambda a_{ik})b_{kj} = \lambda \sum_{k=1}^n a_{ik}b_{kj}= \lambda (AB)_{ij}\] so \((\lambda A)B\) and \(\lambda (AB)\) have the same \(i, j\) entry for any \(i, j\), and are therefore equal. The second equality can be proved similarly.

These results tell you that you can use some of the normal rules of algebra when you work with matrices - similarly to what happened for permutations. Again, like permutations, what you can’t do is use the commutative property.

Two matrices \(A\) and \(B\) are said to **commute** if \(AB\) and \(BA\) are both defined, and \(AB=BA\).

For some pairs of matrices, the product \(AB\) is defined but \(BA\) is not. For example, if \(A\) is \(2\times 3\) and \(B\) is \(3\times 4\) then \(AB\) is defined but \(BA\) isn’t.

Even when both \(AB\) and \(BA\) are defined and have the same size, they won’t in general be equal.

**Example:** let \(A=\begin{pmatrix} 1&2\\3&4 \end{pmatrix}\) and \(B = \begin{pmatrix} 5&6\\7&8 \end{pmatrix}\). Then

\[\begin{align*} AB &= \begin{pmatrix} 19&22\\43&50 \end{pmatrix}\\ BA &= \begin{pmatrix} 23&34\\31&46 \end{pmatrix}.\end{align*}\]

**Definition:** the \(n\) by \(n\) **identity matrix** \(I_n\) is the matrix with \(i,j\) entry 1 if \(i=j\) and 0 otherwise.

For example,

\[I_2 = \begin{pmatrix} 1&0\\0&1 \end{pmatrix}, I_3 = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&1 \end{pmatrix}.\]

The most important property of identity matrices is that they behave like the number 1 does when you multiply by them.

**Lemma:** if \(A\) is an \(m \times n\) matrix then \(I_m A = A = A I_n\).

*Proof:* Let \(A = (a_{ij}), I_n = (\delta_{ij})\), so \(\delta_{ij}\) is 1 if \(i=j\) and 0 otherwise. The formula for matrix multiplication tells us that for any \(i\) and \(j\), the \(i,j\) entry of \(I_mA\) is \(\sum_k \delta_{ik}a_{kj}\) The only term in this sum that can be nonzero is the one when \(k=i\), so the sum equals \(1\times a_{ij} = a_{ij}\). Thus the \(i,j\) entry of \(I_mA\) equals \(a_{ij}\), the \(i,j\) entry of \(A\).

The other equality can be proved similarly.

**Definition:** an \(n \times n\) matrix \(A\) is called **invertible** if and only if there exists an \(n \times n\) matrix \(B\) such that \(AB = BA = I_n\).

If there is such a matrix \(B\), there is only one such matrix \(B\):

**Proposition:** If \(AB=BA=I_n\) and \(AC=CA=I_n\) then \(B=C\).

*Proof:*

\[\begin{align*} B &= BI_n \\ &= B(AC) \\ &= (BA)C &\text{associativity} \\ &= I_nC \\ &= C \end{align*}\]

This means that when a matrix is invertible we can talk about *the* inverse of \(A\), and we call it \(A^{-1}\).

You have probably already seen that there is a simple criterion for when a \(2\times 2\) matrix \(A = \begin{pmatrix} a&b\\c&d \end{pmatrix}\) is invertible. \(A\) is invertible iff \(ad-bc\neq 0\), in which case \(A^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d&-b\\-c & a \end{pmatrix}\).

**Proposition:** if an \(n\times n\) matrix \(A\) has a row of zeroes, or a column of zeroes, then it isn’t invertible.

*Proof:* If \(A\) has a column of zeroes, for any matrix \(B\) the product \(BA\) will have a column which is \(B\) times \(A\)’s zero column, which is a column of zeroes. So \(BA\) is not equal to \(I_n\).

Similarly if \(A\) has a row of zeroes then for any matrix \(B\), the product \(AB\) has a row of zeroes too, so it is not equal to \(A_n\).

If you multiply invertible matrices together - any number - the result is invertible. Recall the shoes-and-socks result about permutations: exactly the same thing is true.

**Proposition:** if \(A_1, \ldots, A_k\) are invertible \(n\times n\) matrices then \(A_1\cdots A_k\) is invertible with inverse \(A_k^{-1}\cdots A_1^{-1}\).

The proof is the same as for permutations.

**Definition:** A system of \(m\) linear equations in \(n\) unknowns \(x_1...x_n\) with coefficients \(a_{ij}, 1 \leq i \leq m, 1 \leq j \leq n\) and \(b_1, \ldots, b_m\) is a list of simultaneous equations

\[\begin{align*} a_{11} x_1 + a_{12} x_2 +\cdots + a_{1n}x_n &= b_1 \\ a_{21} x_1 + a_{22} x_2 + \cdots +a_{2n}x_n &= b_2 \\ \vdots & \vdots \\ a_{m1} x_1 + a_{m2} x_2 +\cdots + a_{mn}x_n &= b_m \end{align*}\]

Every system of linear equations can be written in **matrix form**: the above system is equivalent to

\[A\mathbf{x} = \mathbf{b}\]

where \(A = (a_{ij})\), \(\mathbf{x} = \begin{pmatrix}x_1\\\vdots\\x_n\end{pmatrix}\), and \(\mathbf{b} = \begin{pmatrix}b_1\\ \vdots \\b_m \end{pmatrix}\).

For example, the system of linear equations

\[\begin{align*} 2x + 3y + 4z &= 5\\ x + 5z &= 0 \end{align*}\]

has matrix form

\[\begin{pmatrix} 2&3&4 \\ 1 & 0 & 5 \end{pmatrix} \begin{pmatrix} x\\y\\z \end{pmatrix} = \begin{pmatrix} 5\\0 \end{pmatrix}.\]

This connection means that we can use systems of linear equations to learn about matrices, and use matrices to learn about systems of linear equations. For example, if \(A\) is invertible and we want to solve the matrix equation

\[A\mathbf{x} = \mathbf{b}\]

we could multiply both sides by \(A^{-1}\) to see that there is a *unique* solution \(\mathbf{x} = A^{-1}\mathbf{b}\).

The **augmented matrix** of a system of linear equations whose matrix form is \(A\mathbf{x}=\mathbf{b}\) is the matrix which you get by adding \(\mathbf{b}\) as an extra column on the right of \(A\). We write this as \((A | \mathbf{b})\).

For example, the augmented matrix for the system of linear equations above would be

\[\begin{pmatrix} 2&3&4&5\\ 1&0&5&0 \end{pmatrix}.\]

A **solution** to a system of linear equations with matrix from \(A\mathbf{x}=\mathbf{b}\) is a vector \(\mathbf{y}\) (of numbers this time, not “unknowns”) such that \(A\mathbf{y}=\mathbf{b}\).

A system of linear equations may have a unique solution, many different solutions, or no solutions at all. In future lectures we will see how to find out how many solutions, if any, a system has.

The matrix equation \(A\mathbf{x}=\mathbf{b}\), or the corresponding system of linear equations, is called **consistent** if it has a solution, otherwise it is called **inconsistent**.

If you are given a system of linear equations in variables \(x, y, z\) and asked to solve them, what you probably do is to manipulate the equations by adding multiples of one equation to another until you have “eliminated” some of the variables and you can read off the solutions.

We are going to try to formalise this method of solving linear equations.

Because we want to use matrix methods, let’s solve an example system and keep track of what our equation manipulation does to the corresponding augmented matrix.

Consider the linear system

\[\begin{align*} 3x+4y&=6\\ x+2y&=5. \end{align*}\]

The corresponding matrix equation is \(A\mathbf{x}=\mathbf{b}\) where

\[A=\begin{pmatrix}3&4\\1&2\end{pmatrix}, \mathbf{b} = \begin{pmatrix}6\\ 5 \end{pmatrix}, \mathbf{x} = \begin{pmatrix}x \\y \end{pmatrix}.\]

The augmented matrix is \((A | \mathbf{b}) = \begin{pmatrix} 3&4&6\\1&2&5 \end{pmatrix}\).

To solve the system, we first eliminate \(x\) from first equation by subtracting 3 times the second equation from the first. The equations become

\[\begin{align*} -2y &= -9 \\ x+2y &= 5 \end{align*}\]

The augmented matrix \(\begin{pmatrix} -2&0&-9\\1&2&5 \end{pmatrix}\) of this new system is obtained by adding \(-3\) times the second row of the old augmented matrix to the first row.

Next we get the coefficient of \(y\) in the first equation to 1 by multiplying the first equation by \(-1/2\). The equations become

\[\begin{align*} y &= 9/2 \\ x+2y &= 5 \end{align*}\]

The augmented matrix \(\begin{pmatrix} 0&1&9/2\\1&2&5 \end{pmatrix}\) of this new system is obtained by multiplying every entry in the first row of the old augmented matrix by \(-1/2\).

Next we eliminate \(y\) from the second equation by subtracting 2 times the first equation from the second. The equations become

\[\begin{align*} y &= 9/2 \\ x &= -4 \end{align*}\]

The augmented matrix \(\begin{pmatrix} 0&1&9/2\\1&0&-4 \end{pmatrix}\) of this new system is obtained by adding \(-2\) times the first row to the second row.

Lastly, if we wanted the first equation to tell us the value of the first variable and the second equation to tell us about the second variable, we could swap the order of the two equations, corresponding to swapping the rows of the augmented matrix so that it becomes \(\begin{pmatrix} 1&0&-4\\0&1&9/2 \end{pmatrix}\).

The manipulations we do to systems of linear equations correspond to doing “row operations” to the augmented matrices.

**Definition:** a **row operation** is one of the following things we can do to a matrix.

- add \(l\) times row \(i\) to row \(j\) (for \(j \neq i\), \(l\) any number), written \(r_j \mapsto r_j + l r_i\).
- multiply row \(i\) by \(l\), where \(l\neq 0\), written \(r_i \mapsto l r_i\).
- swap rows \(i\) and \(j\), written \(r_i \leftrightarrow r_j\).

We say “row op” as a shortened form of row operation.

It’s important that these row operations are *invertible*. That is, for each row operation \(r\), there’s another row operation \(s\) such that doing \(r\) then \(s\), or doing \(s\) then \(r\), is the same as doing nothing - it gets you back to the matrix you started with.

Here’s a table of row operations and their inverses.

row op | inverse |
---|---|

add l times \(r_i\) to \(r_j\) | add \(-l\) times \(r_i\) to \(r_j\) |

multiply \(r_i\) by \(l \neq 0\) | multiply \(r_i\) by \(l^{-1}\) |

swap \(r_i\) and \(r_j\) | swap \(r_i\) and \(r_j\) |

An *elementary matrix* is one you can get by doing a single row operation to an identity matrix. These are going to be crucial to our understanding of invertibility and of solving systems of linear equations.

**Examples:**

- The elementary matrix \(\begin{pmatrix} 0&1\\1&0\end{pmatrix}\) results from doing the row operation \(r_1 \leftrightarrow r_2\) to \(I_2\).
- The elementary matrix \(\begin{pmatrix}1&2&0\\0&1&0\\0&0&1\end{pmatrix}\) results from doing the row operation \(r_1 \mapsto r_1 + 2r_2\) to \(I_3\).
- The elementary matrix \(\begin{pmatrix}-1&0\\0&1\end{pmatrix}\) results from doing the row operation \(r_1 \mapsto (-1)r_1\) to \(I_2\).

Doing a row operation \(r\) to a matrix has the same effect as multiplying that matrix on the left by the elementary matrix corresponding to \(r\):

**Theorem.** Let \(r\) be a row operation and \(A\) an \(m \times n\) matrix. Then \(r(A) = r(I_m)A\).

*Proof.* It’s enough to check the case when \(A=\begin{pmatrix} a_1\\ \vdots \\ a_m \end{pmatrix}\) is a column vector, because matrix multiplication works column-by-column.

First of all, note that no matter what \(r\) is, for any \(l\), the \(l\)th entry of \(r(I_m)A\) is

\[\sum_{k=1}^m s_{lk}a_k \;\;\;\; (\dagger)\]

where \(s_{lk}\) is the \(l, k\) entry of \(r(I_m)\).

There are three cases according to the three types of row operation. In each case we will work out \(r(I_m)A\) and show that it is the same as \(r(A)\).

- \(r\) is \(r_i\mapsto \lambda r_i\). For \(l\neq i\), the \(l\)th row of \(r(I_m)\) is the same as the \(l\)th row of \(I_m\), so \(s_{lk}=0\) unless \(k=l\) in which case it is 1. So from (\(\dagger\)), the \(l\)th entry of \(r(I_m)A\) is \(a_l\). If \(l=i\) then \(s_{lk}=\lambda\) if \(k=l\) and 0 otherwise, so again by (\(\dagger\)) the \(i\)th entry of \(r(I_m)A\) is \(\lambda a_i\). Therefore \(r(I_m)A\) is the same as \(A\), except that the \(i\)th entry is multiplied by \(\lambda\). This is \(r(A)\).
- \(r\) is \(r_i \leftrightarrow r_j\). There are three cases for \(l\).
- \(l\neq i,j\). In this case \(s_{lk}=0\) unless \(k=l\) in which case it is 1, so from \((\dagger)\) the sum equals \(a_r\).
- \(l=i\). Since \(r_{ij}(I_m)\) swaps rows \(i\) and \(j\) of \(I_m\), \(s_{ik}=1\) if \(k=j\) and 0 otherwise. So the sum is \(a_j\).
- \(l=j\). Similar to the last case, the sum equals \(a_i\). Therefore \(r(I_m)A\) is the same as \(A\), except in row \(i\) where \(a_j\) appears and row \(j\) where \(a_i\) appears. This is \(r(A)\).

- \(r\) is \(r_i \mapsto r_i + \lambda r_j\). This time the rows of \(r(I_m)\) are the same as the rows of \(I_m\), except the \(i\)th which has \(i, i\) entry 1 and \(i, j\) entry \(\lambda\), so \(s_{ii}=1, s_{ij}=\lambda\), and all other \(s_{ik}=0\). From (\(\dagger\)), every entry of \(r(I_m)A\) is the same as the corresponding entry of \(A\) except the \(i\)th, where we get \(a_i+\lambda a_j\), and again, \(r(I_m)A\) is the same as \(r(A)\).

**Corollary:** elementary matrices are invertible.

*Proof:* Let \(r\) be a row op and \(I\) an identity matrix. Let \(s\) be the inverse row op to \(r\).

\(r(I)s(I) = r(s I)\) by the theorem we just proved, but \(r(s(I))=I\) as \(r\) and \(s\) are inverses.

For the same reasons, \(s(I)r(I) = s(r(I)) = I\).

It follows that \(r(I)\) is invertible with inverse \(s(I)\).

Our informal method of solving linear systems is “do certain manipulations to the equations until they are in a form where the solutions are easy to read off.” This method only works if the manipulations we do don’t change the set of solutions.

When we introduced row ops, it was because their effect on the augmented matrix of a linear system corresponded to the kind of manipulations we perform when solving such a linear system. We’re now going to prove that these row ops don’t change the set of solutions.

**Theorem.** Suppose that \((A' | \mathbf{b}')\) results from \((A | \mathbf{b})\) by doing a sequence of row ops. Then the set of solutions of the matrix equation \(A\mathbf{x}=\mathbf{b}\) equals the set of solutions of \(A'\mathbf{x} = \mathbf{b}'\).

Equivalently, if \(\mathbf{v}\) is a vector then \(A\mathbf{v}=\mathbf{b}\) if and only if \(A'\mathbf{v} = \mathbf{b}'\).

*Proof:* it’s enough to check this when \((A' | \mathbf{b}')\) results from doing a single row op \(r\) to \((A | \mathbf{b} )\). Then we can use this result repeatedly to prove the general case where \((A' | \mathbf{b}')\) results by doing a sequence of row ops.

In the case of a single row op \(r\),

\[(A' | \mathbf{b}') = r(I) (A | \mathbf{b})\]

so (because matrix multiplication works column-by-column) \(A' = r(I)A\) and \(\mathbf{b}' = r(I)\mathbf{b}\).

Now for any vector \(\mathbf{v}\),

\[A\mathbf{v} = \mathbf{b} \iff r(I)A\mathbf{v} = r(I) \mathbf{b}.\]

since if \(A \mathbf{v} = \mathbf{b}\) then we can multiply by \(r(I)\) on the left to get \(r(A)A \mathbf{v} = r(I) \mathbf{b}\), and conversely if \(r(I)A \mathbf{v} = r(I) \mathbf{v}\) then we can multiply by the inverse of \(r(I)\), which we know to be invertible because of our proof in the previous lecture, to get \(A \mathbf{v} = \mathbf{b}\).

Since \(r(I)A = A'\) and \(r(I) \mathbf{b} = \mathbf{b}'\) we get that \(\mathbf{v}\) is a solution of \(A \mathbf{x} = b\) iff it is a solution of \(A' \mathbf{x} = \mathbf{b}'\)

We talked about manipulating equations into a simple form where the solutions could be easily read off. One possible “simple form” is called row reduced echelon form.

**Definition:** a **leading entry** in a matrix is the first non-zero entry in a row, starting from the left.

For example, in the matrix \(\begin{pmatrix} 0&2\\0&0 \end{pmatrix}\). the leading entry in the first row is the 2 in position \(1,2\) while the second row has no leading entry.

**Definition.** a matrix is in **row reduced echelon form** (RREF) if

- all leading entries are 1
- all zero-rows are at the bottom (a zero-row is one whose entries are all zero)
- all entries in the same column as a leading entry are zero
- for every \(i\), if row \(i\) and \(i+1\) have a leading entry then the leading entry in row \(i+1\) is to the right of that in row \(i\)

**Examples.**

- \(\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}\) isn’t in RREF: the zero row is at the top.
- \(\begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}\) isn’t in RREF: there is a row in which the left-most non-zero entry is not 1.
- \(\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}\) isn’t in RREF: the left-most 1 in row 2 is not to the right of the left-most 1 in the row above it.
- \(\begin{pmatrix} 1 &\alpha &\beta & 3 \\ 0 &0 & 1 & -2 \end{pmatrix}\) is in RREF if and only if \(\beta = 0\): the left-most 1 in row 2 is in column 3, but it is not the only non-zero entry in column 3 unless \(\beta = 0\).
- \(\begin{pmatrix} 1 & 0 & 0 & 3 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}\) is in RREF.

Here are two facts about row reduced echelon form.

- every matrix can be put into RREF by doing row ops.
- If \(B\) and \(C\) are matrices in RREF which result from doing row ops to a matrix \(A\), then \(B=C\).

The first is relatively easy to prove by induction on the number of columns. The second is a bit harder. If you believe it, it means we could talk about “the” row reduced echelon form of a matrix \(A\), meaning the unique matrix in RREF that can be obtained by doing row ops to \(A\). We don’t need this result, so we won’t prove it here. If you want to see proofs of both 1 and 2 you can read them at this link.

Fact 1 above says that given any matrix \(A\), we can do a sequence of row operations to \(A\) and end up with a matrix in RREF. Here is an example.

Let \(A = \begin{pmatrix} 1&2&3\\4&5&6\\7&8&9 \end{pmatrix}\). We want to do a sequence of row ops to \(A\) which ends up with a matrix in RREF. Row 1 has a leading entry at position \(1,1\), but the other entries in column 1 aren’t 0. A good start is to use row ops like \(r_j \mapsto r_j + lr_1\) to eliminate the other entries in column 1:

\[A \stackrel{r_2 \mapsto r_2-4r_1}{\mapsto} \begin{pmatrix} 1&2&3\\ 0&-3&-6\\ 7&8&9 \end{pmatrix} \stackrel{r_3 \mapsto r_3-7r_1}{\mapsto} \begin{pmatrix} 1&2&3\\ 0&-3&-6\\ 0&-6&-12 \end{pmatrix}\]

This matrix isn’t in RREF. One reason is that the leading entry in row 2, in position \(2,2\), isn’t equal to 1. To make that leading entry 1 we can use the row op that multiplies row 2 by \(-1/3\):

\[\begin{pmatrix} 1&2&3\\ 0&-3&-6\\ 0&-6&-12 \end{pmatrix} \stackrel{r_2 \mapsto (-1/3)r_2}{\mapsto} \begin{pmatrix} 1&2&3\\ 0&1&2\\ 0&-6&-12 \end{pmatrix}\]

Now we have a leading entry in row 2 column 2 which is equal to 1, but there are other nonzero entries in that column. We must eliminate these by adding multiples of row 2 to rows 1 and 3:

\[\begin{gather*}\begin{pmatrix} 1&2&3\\ 0&1&2\\ 0&-6&-12 \end{pmatrix} \stackrel{r_1 \mapsto r_1-2r_2}{\mapsto} \begin{pmatrix} 1&0&-1\\ 0&1&2 \\ 0&-6&-12 \end{pmatrix} \\ \stackrel{r_3 \mapsto r_3+6r_2}{\mapsto} \begin{pmatrix} 1&0&-1\\ 0&1&2\\ 0&0&0 \end{pmatrix}\end{gather*}\]

This matrix is in RREF, so we are done.

Suppose we start with a linear system with matrix form \(A\mathbf{x}=\mathbf{b}\) then put the augmented matrix \((A | \mathbf{b})\) into RREF. Suppose the resulting matrix in RREF is \((A' | \mathbf{b}')\). The whole point of this was that the solutions of \(A \mathbf{x} = \mathbf{b}\) are the same as those of \(A' \mathbf{x} = \mathbf{b}'\), but it should be “easy” to find the solutions of \(A' \mathbf{x} = \mathbf{b}'\). How do we actually find those solutions?

**Example 1.** Here is an augmented matrix in RREF

\[\begin{pmatrix} 1&0&2&0&0\\ 0&1&4&0&0\\ 0&0&0&1&0\\ 0&0&0&0&1 \end{pmatrix}\]

If the variables are called \(x, y, z, w\) then the corresponding equations are

\[\begin{align*} x + 2z &= 0 \\ y+4z &= 0\\ w &= 0 \\ 0 &= 1 \end{align*}\]

The last equation is impossible, so there are no solutions to this linear system.

**Example 2.** Here is the same augmented matrix with a different final column.

\[\begin{pmatrix} 1&0&2&0&2\\ 0&1&4&0&3\\ 0&0&0&1&4\\ 0&0&0&0&0 \end{pmatrix}\]

In this case, if the variables are \(x, y, z, w\), the equations are

\[\begin{align*} x + 2z &= 2 \\ y+4z &= 3\\ w &= 4\\ 0 &= 0\end{align*}\]

The solutions are \(x=2-2z, y=3-4z, w=4\). The last \(0=0\) equation doesn’t tell us anything so it can be ignored.

We can write the solutions in **vector form** as

\[\begin{align*} \begin{pmatrix}x\\ y\\ z\\ w\end{pmatrix} &= \begin{pmatrix}2-2z\\ 3-4z\\ z\\ 4\end{pmatrix} \\ &= \begin{pmatrix}2\\3\\0\\4\end{pmatrix} + z\begin{pmatrix}-2\\ -4\\ 1\\ 0\end{pmatrix} \end{align*}\]

- If the last column of the augmented matrix has a leading entry (like in example 1), there are no solutions. Otherwise,
- Variables corresponding to a column with no leading entry (like \(z\) in example 2) can be chosen freely (“free parameters”).
- Other variables can be uniquely determined in terms of these free parameters.

We are going to prove the following theorem:

**Theorem:** a square matrix \(A\) is invertible if and only if there is a sequence of row ops taking \(A\) to the identity matrix.

We need a couple of lemmas to make the proof work.

**Lemma:** let \(X\) be a \(n \times n\) matrix in RREF. Either it is the identity, or it has a column with no leading entry.

*Proof:* suppose every column has a leading entry, so there are \(n\) leading entries. There’s at most one leading entry per row and there are \(n\) rows, so every row must have a leading entry.

Let \(c_i\) be the number of the column in which the row \(i\) leading entry occurs. Then \(c_1 < c_2 < \ldots < c_n\) by the RREF condition on leading entries. Since the numbers \(c_i\) are \(1,\ldots, n\) in some order it must be \(c_i=i\) for all \(i\). That means that the row 1 leading entry is in column 1, the row 2 leading entry is in column 2, and so on.

Because \(X\) is in RREF, columns with leading entries have zeroes in all other positions. So the first column is \(\begin{pmatrix} 1\\0\\ \vdots \\ 0 \end{pmatrix}\), the second column is \(\begin{pmatrix} 0\\1\\0\\ \vdots \\ 0 \end{pmatrix}\), and so on. These are the columns of the identity matrix, so \(X=I\).

**Lemma:** let \(E\) be invertible and \(A\) a square matrix. Then \(EA\) is invertible if and only if \(A\) is invertible.

*Proof.* If \(A\) is invertible, \(EA\) is a product of invertible matrices hence is invertible. If \(EA\) is invertible, call its inverse \(B\). I claim \(A\) is invertible with inverse \(BE\). For \((BE)A = B(EA) = I\), and \(EAB = I\), so \(AB = E^{-1}\) and \(ABE = I\), that is \(A(BE) = I\).

Suppose there is a sequence of row ops taking \(A\) to \(I\), say \(r_1, \ldots, r_k\). Let \(E_i = r_i(I)\), the elementary matrix associated to \(r_i\). Then

\[E_k E_{k-1} \cdots E_1 A = I\]

since we know that doing \(r_i\) is the same as left-multiplication by \(E_i\) (this is the Theorem in video 7). Every elementary matrix is invertible (corollary in video 7). The matrix \(E = E_k \cdots E_1\) is invertible as it is a product of invertible matrices (video 4). \(EA = I\), so it is certainly invertible, so by the second lemma above \(A\) is invertible.

Conversely suppose there’s no sequence of row ops taking \(A\) to \(I\). We can do a sequence of row ops to any matrix and end up with a RREF matrix, so when we do this to \(A\), the RREF matrix we get - call it \(X\) - cannot be \(I\).

Our first lemma above says \(X\) has a column with no leading entry, so there are \(n-1\) or fewer leading entries, so there’s a row with no leading entries, that is, a zero row. So \(X\) isn’t invertible (first proposition, video 4).

As before, there’s an invertible matrix \(E\) such that \(EA = X\). Since \(X\) isn’t invertible, by the second lemma above, \(A\) isn’t invertible.

**Theorem:** a square matrix \(A\) is invertible if and only if the only solution to \(A\mathbf{x}=\mathbf{0}\) is \(\mathbf{x}=\mathbf{0}\).

*Proof:* If \(A\) is invertible and \(A\mathbf{x} = \mathbf{0}\) then \(\mathbf{x}=A^{-1}\mathbf{0} = \mathbf{0}\).

If \(A\) is not invertible, we can do a sequence of row ops to \(A\) ending with a RREF matrix \(X\) which cannot be the identity because of the previous theorem.

By the first lemma above, \(X\) has a column with no leading entry. That means there is a free parameter in the solutions to \(X \mathbf{x} = \mathbf{0}\), and in particular, there are infinitely many different solutions to \(X \mathbf{x} = \mathbf{0}\) (so there is a non-zero solution).

The solutions to \(X \mathbf{x} = \mathbf{0}\) are the same as the solutions to \(A \mathbf{x} = \mathbf{0}\) (theorem from video 8), so \(A \mathbf{x} = \mathbf{0}\) also has a non-zero solution.

Let \(A\) be a square matrix. We now have a method of determining whether or not \(A\) is invertible: do row operations to \(A\) until you reach a matrix in RREF. Then \(A\) is invertible if and only if the RREF matrix is invertible.

What if we actually want to know what the inverse matrix is? You probably already know that a \(2\times 2\) matrix \(A=\begin{pmatrix} a&b\\c&d \end{pmatrix}\) is invertible iff \(ad-bd \neq 0\), and in this case

\[A^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c &a \end{pmatrix}\]

This formula does generalise to larger matrices, but not in a way which is easy to use: for example, the general formula for the inverse of a \(3\times 3\) invertible matrix \(A = (a_{ij})\) is

\[A^{-1} = \frac{1}{\Delta} \begin{pmatrix} \begin{vmatrix} a_{22} & a_{23} \\ a_{32}& a_{33} \end{vmatrix} & -\begin{vmatrix} a_{12} & a_{13} \\ a_{32}& a_{33} \end{vmatrix} & \begin{vmatrix} a_{12} & a_{13} \\ a_{22}& a_{23} \end{vmatrix} \\ -\begin{vmatrix} a_{21} & a_{23} \\ a_{31}& a_{33} \end{vmatrix} & \begin{vmatrix} a_{11} & a_{13} \\ a_{31}& a_{33} \end{vmatrix} & -\begin{vmatrix} a_{11} & a_{13} \\ a_{21}& a_{23} \end{vmatrix} \\ \begin{vmatrix} a_{21} & a_{22} \\ a_{31}& a_{32} \end{vmatrix} & -\begin{vmatrix} a_{11} & a_{12} \\ a_{31}& a_{32} \end{vmatrix} & \begin{vmatrix} a_{11} & a_{12} \\ a_{21}& a_{22} \end{vmatrix} \end{pmatrix}\]

where \(\begin{vmatrix} a&b\\c&d\end{vmatrix}\) means \(ad-bc\) and \(\Delta = a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32}-a_{11}a_{23}a_{32} - a_{12}a_{21}a_{33} - a_{13}a_{22}a_{31}\). This isn’t a good formula for practical use. Luckily we can use RREF techniques to determine invertibility and find inverses.

Let \(A\) be a \(n\times n\) matrix, and suppose we want to find out whether \(A\) is invertible and if so what its inverse is. Let \(I\) be the \(n\times n\) identity matrix. Here is a method:

- Form the super-augmented matrix \((A | I)\).
- Do row ops to put this into RREF.
- If you get \((I | B)\) then \(A\) is invertible with inverse \(B\).
- If the first part of the matrix isn’t \(I\) then \(A\) isn’t invertible.

Why it works: the first part of the matrix is a RREF matrix resulting from doing row ops to \(A\), so if it is \(I\) then \(A\) is invertible and if it is not \(I\) then \(A\) is not invertible. It just remains to explain why, in the case \(A\) is invertible, you end up with \((I | A^{-1})\).

Think about the columns \(\mathbf{c}_1,\ldots, \mathbf{c}_n\) of the inverse of \(A\). We have \(A (\mathbf{c}_1 \; \cdots \; \mathbf{c}_n) = I\), so \(A\mathbf{c}_1 = \mathbf{e}_1\), \(A \mathbf{c}_2 = \mathbf{e}_2\), etc, where \(\mathbf{e}_i\) is the ith column of \(I\). So \(\mathbf{c}_1\) is the unique solution of the matrix equation \(A\mathbf{x} = \mathbf{e}_1\). You find that by putting \((A | \mathbf{e}_1)\) into RREF, and you must get \((I | \mathbf{c}_1)\) since \(\mathbf{c}_1\) is the only solution.

Repeating that argument for every column, when we put \((A | \mathbf{e}_1 \; \cdots \; \mathbf{e}_n)\) into RREF we get \((I | \mathbf{c}_1\; \cdots \; \mathbf{c}_n)\), that is, \((I | A^{-1})\).

Let \(A=\begin{pmatrix} 1&2\\3&4\end{pmatrix}\). To find whether \(A\) is invertible, and if so what its inverse is, we put \((A | I_2)\) into RRE form:

\[\begin{align*} \begin{pmatrix} 1&2 & 1 & 0 \\ 3&4 & 0 & 1 \end{pmatrix} & \stackrel{r_2 \mapsto r_2-3r_1}{\mapsto} \begin{pmatrix} 1&2&1&0\\ 0&-2&-3&1 \end{pmatrix} \\ & \stackrel{r_2 \mapsto (-1/2)r_2}{\mapsto} \begin{pmatrix} 1&2&1&0\\ 0&1&3/2 & -1/2 \end{pmatrix} \\ & \stackrel{r_1 \mapsto r_1-2r_2}{\mapsto} \begin{pmatrix} 1&0&-2 & 1 \\ 0&1&3/2&-1/2 \end{pmatrix} \end{align*}\]

This is in RRE form, so the inverse of \(A\) is

\[\begin{pmatrix} -2&1 \\ 3/2 & -1/2 \end{pmatrix}\]

as you can check by multiplying them together.