This part of the module is about generalizing what we know about matrices and vectors.

When we talk about vectors, or matrices, there’s an important thing we have to decide: where do the entries come from? For example, we might work with matrices with real numbers as entries, or with complex numbers as entries. We never really discussed this in the first part of the module, because it doesn’t make any difference to the theory we developed. So it’s natural to ask: what other kinds of numbers could we use as entries in our vectors and matrices, and still have everything work OK?

The answer is that the entries must come from what is called a **field**. Roughly speaking, a field is a set with multiplication and addition operations that obey the usual rules of algebra, and where you can divide by any non-zero element. Examples are \(\mathbb{R}\), the set of all real numbers, \(\mathbb{C}\), the set of all complex numbers, \(\mathbb{Q}\), the set of all rational numbers. Non-examples are \(\mathbb{Z}\): you can’t divide by 2 in \(\mathbb{Z}\), and \(M_{2\times 2}(\mathbb{R})\), the set of all \(2\times 2\) real matrices, again because we know there are non-zero \(2\times 2\) matrices which aren’t invertible.

The usual way to define a field is to write down a list of axioms. Everything that satisfies the axioms is a field. If you do want to know the field axioms, here they are: you can read them here

What we’re going to do in this video is study a family of examples of fields which are very different to the real, complex, and rational numbers: **finite fields of prime order**.

Let \(p\) be a prime number. Then \(\mathbb{F}_p\) is the set \(\{0, 1, 2, \ldots, p-1\}\) with addition and multiplication exactly the same as for ordinary whole numbers, except that we regard all multiples of \(p\) as equal to 0.

This is easier to understand in an example. Let’s work in \(\mathbb{F}_5\). Then

- \(3 + 4 = 7 = 2\)
- \(3 + 2 = 0\)
- \(4 \times 3= 12 = 2\),

and so on. We can deal with subtractions and negative numbers too:

- \(3 - 4 = -1 = -1 + 5 = 4\)
- \(4 \times (-3) = -12 = -12 + 15 = 3\).

There’s an easy way to summarize *all* the possible additions and multiplications in \(\mathbb{F}_5\): draw tables with rows and columns labelled by \(0, 1, 2, 3, 4\) and in row \(i\) and column \(j\) put the result of \(i+j\) or \(i \times j\).

+ | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

0 |
0 | 1 | 2 | 3 | 4 |

1 |
1 | 2 | 3 | 4 | 0 |

2 |
2 | 3 | 4 | 0 | 1 |

3 |
3 | 4 | 0 | 1 | 2 |

4 |
4 | 0 | 1 | 2 | 3 |

\(\times\) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

0 |
0 | 0 | 0 | 0 | 0 |

1 |
0 | 1 | 2 | 3 | 4 |

2 |
0 | 2 | 4 | 1 | 3 |

3 |
0 | 3 | 1 | 4 | 2 |

4 |
0 | 4 | 3 | 2 | 1 |

Let’s do the same for another example, \(\mathbb{F}_3\):

+ | 0 | 1 | 2 |
---|---|---|---|

0 |
0 | 1 | 2 |

1 |
1 | 2 | 0 |

2 |
2 | 0 | 1 |

\(\times\) | 0 | 1 | 2 |
---|---|---|---|

0 |
0 | 0 | 0 |

1 |
0 | 1 | 2 |

2 |
0 | 2 | 1 |

I said that in a field, you have to be able to divide by non-zero elements. Let’s check this is true in these two examples. Being able to divide by \(x\) is equivalent to the existence of an element \(1/x\) or \(x^{-1}\), called a **multiplicative inverse** to \(x\) (because \(y/x\) is just \(y \times 1/x\)). We need to check that for each nonzero \(x\) there is another element which you can multiply \(x\) by to get 1.

In \(\mathbb{F}_3\), you can see from the table that

- \(1\times1 = 1\), so \(1^{-1}=1\)
- \(2\times 2 = 1\), so \(2^{-1}=2\)

…and so every non-zero element of \(\mathbb{F}_3\) has a multiplicative inverse.

In \(\mathbb{F}_5\), you can see from the table that

- \(1\times 1 = 1\), so \(1^{-1}=1\)
- \(2\times 3 = 1\), so \(2^{-1}=3\) and \(3^{-1}=2\)
- \(4\times 4 = 1\), so \(4^{-1} = 4\).

…and so every non-zero element of \(\mathbb{F}_5\) has a multiplicative inverse.

The finite fields we work with will be small enough that you can find these “multiplicative inverses” by trial and error. For example, in \(\mathbb{F}_7\), the multiplicative inverse of 2 is 4 as \(2\times 4 = 8 = 1\).

We’re going to give a proof that every element of \(\mathbb{F}_p\) has a multiplicative inverse and give a method for finding these multiplicative inverses. In fact we’re going to do both in one.

First, here is a reminder about division with remainder. There is a fundamental fact about the natural numbers: if you have any two positive whole numbers \(x\) and \(y\), you can try to divide \(x\) by \(y\). When you do that you get a quotient \(q\) and a remainder \(r\), which is a nonnegative integer less than \(y\):

\[\begin{align*} x &= q y + r & 0 \leq r < y \end{align*}\]

It’s easy to convince yourself of the truth of this by imagining starting at zero on the number line and taking steps of size \(y\) to the right, stopping when the next step would take you past \(x\). Call the number of steps taken at the point you stop \(q\). You’ve stopped at the number \(qy\), and the distance from this point to \(x\), which is the remainder \(r = x -qy\), must be less than \(y\) since otherwise you could take another full step without passing \(x\).

Now we’re ready to prove:

**Theorem**. Let \(p\) be prime and \(0 < a <p\) be a whole number. Then there are whole numbers \(s\) and \(t\) such that \(as + pt=1\).

*Proof.* First we’re going to divide \(p\) by \(a\), getting a quotient \(q_2\) and a remainder \(r_2\)

\[p = q_2 a + r_2 \; \; 0 \leq r_2 < a\]

(I’ve chosen to start with a 2 here because I want to think of \(p\) as \(r_0\) and \(a\) as \(r_1\). You’ll see why later).

If \(r_2=0\) we stop. Otherwise we divide \(a\) by \(r_2\) getting a quotient \(q_3\) and remainder \(r_3\)

\[a = q_3 r_2 + r_3 \;\; 0 \leq r_3 < r_2\]

If \(r_3=0\) we stop. Otherwise we divide \(r_2\) by \(r_3\) getting a quotient \(q_4\) and remainder \(r_4\):

\[r_2 = q_4 r_3 + r_4 \;\; 0 \leq r_4 < r_3\]

If \(r_4=0\) we stop. Otherwise we divide \(r_3\) by \(r_4\) getting quotient \(q_5\) and remainder \(r_5\):

\[r_3 = q_5 r_4 + r_5 \;\; 0 \leq r_5 < r_4\]

and so on.

Look at the \(r\)-values here: they are all \(\geq 0\), but

\[a > r_2 > r_3 > r_4 > r_5 > \cdots \geq 0\]

You can’t have a decreasing sequence of positive integers that goes on forever, so eventually - after at most \(a\) steps - we must end up with one of the \(r\)s being zero. Let’s say \(r_m = 0\), so the last few divisions were

\[\begin{align} r_{m-5} &= q_{m-3} r_{m-4} + r_{m-3} \\ r_{m-4} &= q_{m-2} r_{m-3} + r_{m-2} \\ r_{m-3} &= q_{m-1} r_{m-2} + r_{m-1} \\ r_{m-2} &= q_{m} r_{m-1} + 0 \end{align}\]

I claim that \(r_{m-1} = 1\). For notice that \(r_{m-1}\) divides \(r_{m-2}\), because of the last equation. Then look at the equation above: both terms on the right hand side are multiples of \(r_{m-1}\), so the left hand side \(r_{m-3}\) is a multiple of \(r_{m-1}\) as well. Repeating the same argument we end up with \(r_{m-1}\) dividing all the left-hand-sides, in particular, \(r_{m-1}\) divides the prime number \(p\). Since \(r_{m-1} < a < p\) we have \(r_{m-1}=1\).

So that second-to-last equation is really

\[r_{m-3} = q_{m-1} r_{m-2} + 1\]

or equivalently

\[r_{m-3} - q_{m-1}r_{m-2} = 1. \;\;\; \dagger\]

Now we can express \(r_{m-3}\) in terms of \(r\)s with a smaller subscript with equation (1), and we can express \(r_{m-2}\) in terms of \(r\)s with a smaller subscript using (2). When we do this in (\(\dagger\)), we get get \(1 =\) some multiple of \(r_{m-3}\) plus some multiple of \(r_{m-4}\). And we can keep doing that over and over again until we eventually get \(1 =\) a multiple of \(r_2\) plus a multiple of \(r_1\), that is, \(as + pt\) for some whole numbers \(s\) and \(t\). We’re done.

It helps to see an example. Take \(p = 13\) and \(a = 5\). We have

\[\begin{align} 13 &= 2 \times 5 + 3 \\ 5 &= 1 \times 3 + 2 \\ 3 &= 1 \times 2 + 1\\ 2 &= 2 \times 1 + 0 \end{align}\]

and so

\[\begin{align} 1 &= 3 - 1 \times 2 \\ &= (13 - 2 \times 5) - 1 \times (5 - 1 \times 3) \\ &= (13 - 2 \times 5) - 1 \times (5 - 1 \times (13 - 2 \times 5)) \\ &= 2 \times 13 - 5 \times 5 \end{align}\]

Why does the theorem help us find multiplicative inverses? The reason is that if \(as + pt = 1\) then in \(\mathbb{F}_p\) we have, since multiples of \(p\) are equal to zero, \(as = 1\). So \(s\) is the multiplicative inverse of \(a\). In our example, \(p=13,a = 5, s=-5, t=2\), so the multiplicative inverse of \(5\) in \(\mathbb{F}_{13}\) is \(-5\) which is the same as \(8\). So the theorem both guarantees a multiplicative inverse exists, and gives us a way to compute it.

We are now ready to define vector spaces. The idea is to observe that sets of column vectors, or row vectors, or more generally matrices of a given size, all come equipped with a notion of addition and scalar multiplication and all obey the same collection of simple algebraic rules, like addition is commutative, scalar multiplication distributes over vector addition, and so on. We’re going to define a vector space as any set with an operation of addition and scalar multiplication obeying similar rules to those satisfied by column vectors. The power of doing this is that it lets us apply our theory in seemingly entirely different contexts.

Here’s the definition.

**Definition:** let \(\mathbb{F}\) be a field. An \(\mathbb{F}\)-vector space is a set \(V\) with

- a special element 0 or \(0_V\) called the
**zero vector** - an operation + called addition
- a way to multiply elements of \(V\) by elements of \(\mathbb{F}\), called
**scalar multiplication**

…such that for all \(u,v,w\) in \(V\) and all \(l, m\) in \(\mathbb{F}\),

- \(0_V+v=v\)
- there exists \(x \in V\) such that \(x + v = 0_V\)
- \(u+(v+w) = (u+v)+w\)
- \(v+w=w+v\)
- \(1v = v\)
- \(l(mv) = (lm)v\)
- \((l+m)v = lv + mv\)
- \(l(v+w)= lv + lw\)

The elements of vector spaces can be anything at all. They don’t have to look like column or row vectors. Here are some examples of vector spaces.

- \(\mathbb{R}^n\) is a real vector space, \(\mathbb{C}^n\) is a complex vector space, and if \(\mathbb{F}\) is any field then \(\mathbb{F}^n\), the set of all height n column vectors with entries from \(\mathbb{F}\) is an \(\mathbb{F}\)-vector space.
- \(M_{m\times n}(\mathbb{R})\), the set of all \(m \times n\) matrices with real entries, is a real vector space with the zero vector being the all-zeroes matrix. Similarly for any other field.
- \(\{0\}\) with the only possible operations is an \(\mathbb{F}\)-vector space, for any field \(\mathbb{F}\), “the zero vector space.”
- Let \(\mathcal{F}\) be the set of all functions \(\mathbb{R} \to \mathbb{R}\). Define \(f+g\) to be the function \(\mathbb{R} \to \mathbb{R}\) given by \((f+g)(x) = f(x) + g(x)\) and, for a real number \(l\) and a function \(f\), define \(lf\) by \((lf)(x) = l f(x)\). Then \(\mathcal{F}\) is a real vector space with the “zero vector” being the constant function taking the value 0.
- If \(A\) is a \(m \times n\) matrix, the set of all solutions of \(Ax = 0\) is a vector space.
- The set of all real solutions to the differential equation \(y'' + a y' + by = 0\) is a vector space, with the definitions of addition and scalar multiplication as in \(\mathcal{F}\) above.
- The set \(\mathbb{F}[x]\) of all polynomials in one variable x is a \(\mathbb{F}\)-vector space, as is the set \(\mathbb{F}_{\leq n}[x]\) of all polynomials in \(x\) of degree at most \(x\).
- the set of magic matrices, those whose row sums and column sums are all equal, is a vector space with the usual matrix scalar addition and multiplication.
- the set of configurations of the game
*lights out*is a \(\mathbb{F}_2\) vector space, and the moves are vector addition.

The picture above shows how *lights out* moves work: you can read more [on the Wikipedia page](https://en.wikipedia.org/wiki/Lights_Out_(game). It was created by Life of Riley and is in the public domain.

There are some familiar properties of vector addition and scalar multiplication - like the fact that if you multiply a vector by the scalar zero, you get the zero vector - which aren’t listed in the axioms. Are they special to column vectors, or do they hold in every vector space?

To answer questions like this we can give a proof that uses only the vector space axioms, not the specific form of a particular vector space’s elements.

**Lemma:** let \(V\) be a vector space and \(\mathbf{v} \in V\). Then \(0\mathbf{v} = 0_V\).

Be careful that you understand the notation here. \(0_V\) means “the special zero vector given in the definition of the vector space \(V\),” and \(0\mathbf{v}\) means “the vector \(v\) scalar multiplied by the scalar 0.” They’re not “obviously” the same thing.

*Proof:* \(0\mathbf{v} = (0+0)\mathbf{v} = 0\mathbf{v} + 0\mathbf{v}\) (axiom 7). Axiom 2 says there’s an element \(\mathbf{u}\) of \(V\) such that \(\mathbf{u} + 0\mathbf{v} = 0_V\), so add it to both sides:

\[\begin{align*} \mathbf{u} + 0\mathbf{v} & = \mathbf{u}+ (0\mathbf{v} + 0\mathbf{v}) \\ 0_V &= (\mathbf{u} + 0\mathbf{v}) + 0 \mathbf{v} & \text{axiom 3, definition of u} \\ 0_V &= 0_V + 0\mathbf{v} & \text{definition of u} \\ 0_V &= 0\mathbf{v} & \text{axiom 1}. \end{align*}\]

Let’s try another familiar property:

**Lemma:** Let \(V\) be an \(\mathbb{F}\)-vector space and let \(\mathbf{x}\in V\). Then \(\mathbf{x} + (-1)\mathbf{x}= \mathbf{0}_V\).

*Proof*:

\[\begin{align*} \mathbf{x} + (-1) \mathbf{x} &= 1 \mathbf{x} + (-1) \mathbf{x} & \text{axiom 5} \\ &= (1 + -1) \mathbf{x} & \text{axiom 7} \\ &= 0 \mathbf{x} \\ &= \mathbf{0}_V & \text{previous lemma}. \end{align*}\]

We write \(-\mathbf{x}\) for the additive inverse of \(\mathbf{x}\) which axiom 2 provides, and \(\mathbf{y}-\mathbf{x}\) as shorthand for \(\mathbf{y}+ -\mathbf{x}\).

Here are two more.

**Lemma:**

- Let \(\lambda\) be a scalar. Then \(\lambda \mathbf{0} _V = \mathbf{0} _V\).
- Suppose \(\lambda \neq 0\) is a scalar and \(\lambda \mathbf{x} = \mathbf{0} _V\). Then \(\mathbf{x} = \mathbf{0}_V\).

*Proof.* First part:

\[\begin{align*}\lambda \mathbf{0} _V & = \lambda( \mathbf{0} _V + \mathbf{0}_V) & \text{axiom 1} \\ &= \lambda \mathbf{0} _V + \lambda \mathbf{0}_V & \text{axiom 7.} \end{align*}\]

Axiom 2 tells there’s an additive inverse to \(\lambda \mathbf{0}_V\). Adding it to both sides and using axiom 3, we get \(\mathbf{0} _V=\lambda \mathbf{0} _V\).

Second part:

\[\begin{align*} \lambda \mathbf{x} &= \mathbf{0} _V \\ \lambda^{-1} (\lambda \mathbf{x} ) &= \mathbf{0} _V &\text{by the above} \\ (\lambda ^{-1}\lambda) \mathbf{x} &= \mathbf{0} _V & \text{axiom 5} \\ 1 \mathbf{x} &= \mathbf{0} _V \\ \mathbf{x} &= \mathbf{0} _V &\text{axiom 6}. \end{align*}\]

**Definition:** a **subspace** of a vector space \(V\) is a subset \(U\) of \(V\) which

- contains the zero vector \(0_V\),
- is “closed under addition”: for all \(\mathbf{v},\mathbf{w} \in U\) we have \(\mathbf{v}+\mathbf{w} \in U\), and
- is “closed under scalar multiplication”: for all scalars \(l\) and all \(\mathbf{u} \in U\) we have \(l\mathbf{u} \in U\).

We write \(U \leq V\) to mean “\(U\) is a subspace of \(V\).”

The idea this definition captures is that a subspace of \(V\) is a nonempty subset which is also a vector space under “the same” addition and scalar multiplication as \(V\).

Let’s notice something else straight away. If \(U \leq V\) and \(u_1,\ldots, u_n \in U\) and \(\lambda_1, \ldots, \lambda_n\) are scalars then \(\sum_{i=1}^n \lambda_i u_i \in U\) - this follows from using closure under addition lots of times.

- If \(V\) is any vector space, \(V \leq V\) and \(\{0_V\} \leq V\).
- \(U\) = all vectors in \(\mathbb{R}^n\) with first coordinate 0 is a subspace of \(\mathbb{R}^n\).
- \(U\) = all vectors with first coordinate 1 is
*not*a subspace - \(\mathbb{Z}\nleq \mathbb{R}\).
- \(U\) = all functions with \(f(0)=0\) is a subspace of the vector space \(\mathcal{F}\) of all functions \(\mathbb{R}\to \mathbb{R}\).
- \(\{A\in M_{n\times n}(\mathbb{R}) : A^T=A\} \leq M_{n\times n}(\mathbb{R})\).
- \(\{A \in M_{n\times n}(\mathbb{R}) : A^2 = 0\}\nleq M_{n\times n}(\mathbb{R})\).

Let’s verify the second example in the case \(n=2\). To prove that \(U \leq \mathbb{R}^2\) we must check the three conditions in the definition of subspace.

- The zero vector in \(\mathbb{R}^2\) is \(\begin{pmatrix} 0\\0 \end{pmatrix}\). This has first coordinate 0, so it is an element of \(U\).
- Let \(v, w \in U\), so that \(v = \begin{pmatrix} 0\\ x \end{pmatrix}\) and \(w = \begin{pmatrix} 0\\y \end{pmatrix}\) for some real numbers \(x\) and \(y\). Then \(v+w = \begin{pmatrix} 0\\x+y \end{pmatrix}\) has first coordinate 0, so it is an element of \(U\).
- Let \(v\) be as above and \(l \in \mathbb{R}\). Then \(lv = \begin{pmatrix} 0\\ lx \end{pmatrix}\) which has first coordinate 0, so \(lv \in U\).

All three conditions hold, so \(U \leq \mathbb{R}^2\).

**Proposition:** let \(X\) and \(Y\) be subspaces of a vector space \(V\). Then

- \(X \cap Y\) and,
- \(X+Y = \{x+y: x \in X, y \in Y\}\) are subspaces of \(V\).

*Proof:* To show something is a subspace we have to check the three properties: containing the zero vector, closure under addition, and closure under scalar multiplication.

**1.**

- \(\mathbf{0}_V \in U \cap W\) as \(U\) and \(W\) are subspaces so contain \(\mathbf{0}_V\).
- Let \(\mathbf{x},\mathbf{y}\in U \cap W\). \(U\) is a subspace, so closed under addition, so \(\mathbf{x}+\mathbf{y} \in U\). For the same reason \(\mathbf{x}+\mathbf{y} \in W\). Therefore \(\mathbf{x}+\mathbf{y} \in U\cap W\).
- Let \(\lambda\) be a scalar and \(\mathbf{x} \in U \cap W\). \(U\) is a subspace, so closed under scalar multiplication, so \(\lambda \mathbf{x} \in U\). For the same reason \(\lambda \mathbf{x} \in W\). Therefore \(\lambda \mathbf{x} \in U \cap W\).

**2.**

- \(\mathbf{0}_V\) is in \(U\) and \(W\) as they are subspaces, so \(\mathbf{0}_V + \mathbf{0}_V = \mathbf{0}_V\) is in \(U+W\).
- Any two elements of \(U+W\) have the form \(\mathbf{u}_1+\mathbf{w}_1\) and \(\mathbf{u}_2+\mathbf{w}_2\), where \(\mathbf{u}_i \in U\) and \(\mathbf{w}_i \in W\). \[ (\mathbf{u}_1+\mathbf{w}_1) + (\mathbf{u}_2+\mathbf{w}_2) = (\mathbf{u}_1+\mathbf{u}_2) + (\mathbf{w}_1+\mathbf{w}_2) \] by the vector space axioms. But \(\mathbf{u}_1+\mathbf{u}_2 \in U\) as \(U\) is a subspace and \(\mathbf{w}_1+\mathbf{w}_2\in W\) as \(W\) is a subspace, so this is in \(U+W\) which is therefore closed under addition.
- Let \(\lambda\) be a scalar. \[ \lambda (\mathbf{u}_1+\mathbf{w}_1) = \lambda \mathbf{u}_1+\lambda \mathbf{w}_1 \] \(\lambda \mathbf{u}_1 \in U\) as \(U\) is a subspace so closed under scalar multiplication, \(\lambda \mathbf{w}_1 \in W\) for the same reason, so their sum is in \(U+W\) which is therefore closed under scalar multiplication.

**Definition:** Let \(V\) be a vector space and \(v_1,\ldots, v_n \in V\). A **linear combination** of \(v_1,\ldots,v_n\) is an element of \(V\) of the form \[\lambda_1 v_1 + \lambda_2 v_2 + \cdots + \lambda_n v_n\] where the \(\lambda_i\) are scalars.

So you can think of a subspace as a nonempty subset which is “closed under taking linear combinations.”

**Definition:** let \(V\) be a vector space.

- A sequence \(v_1,\ldots v_n\) of elements of \(V\) is
**linearly independent**if and only if the only scalars such that \(\sum_{i=1}^n \lambda_i v_i = 0\) are \(\lambda_1=\cdots = \lambda_n = 0.\) - A sequence which is not linearly independent is called
**linearly dependent**.

It is important that linear independence is a property of *sequences* (not sets) of vectors. Sequences have a particular order, and they can contain the same element multiple times.

Checking whether elements of a vector space are linearly independent is simple. You just have to try and find a linear combination that gives the zero vector where not all the scalars are zero. If you can do it, the sequence is linearly dependent, if you can’t it is linearly independent. When we’re talking about vectors in \(\mathbb{F}^n\), or matrices, this is just solving linear equations.

- \(u = \begin{pmatrix}1\\ 0\end{pmatrix}, v= \begin{pmatrix}0\\ 1\end{pmatrix}, \begin{pmatrix}1\\ 1\end{pmatrix}\) are not linearly independent in \(\mathbb{R}^2\), because \(1 \times u + 1 \times v + (-1) \times w = \mathbf{0}\).
- \(u = \begin{pmatrix}1\\1\end{pmatrix}, \begin{pmatrix}1\\ -1\end{pmatrix}\) are linearly independent in \(\mathbb{R}^2\). For if \(\alpha u + \beta v = \begin{pmatrix} 0\\0 \end{pmatrix}\) then \(\begin{pmatrix} \alpha + \beta \\ \alpha - \beta \end{pmatrix} = \begin{pmatrix} 0\\0 \end{pmatrix}\). This is a system of linear equations:

\[\begin{align*} \alpha+\beta &=0 \\ \alpha-\beta &= 0\end{align*}\]

For such a simple system it’s easy to see that the only solution is \(\alpha = \beta = 0\). This tells you that the only solution to \(\alpha u + \beta v = \mathbf{0}\) is \(\alpha = \beta = 0\), which is the definition of linear independence for \(u, v\).

- \(\begin{pmatrix}1 \\0\end{pmatrix}\) and \(\begin{pmatrix}0\\1\end{pmatrix}\) are linearly independent in \(\mathbb{R}^2\). You can prove this in a similar (but easier) way to the previous example.
- more generally if \(e_i\) is the height \(n\) column vector with 0 everywhere except 1 at position \(i\), then the sequence \(e_1,\ldots, e_n\) is linearly independent in \(\mathbb{R}^n\).
- In \(\mathcal{F}\), the vector space of all functions \(\mathbb{R} \to \mathbb{R}\), the functions \(f(x) = \cos(x)\) and \(g(x) = \sin(x)\) are linearly independent. If \(\alpha f + \beta g = 0_{ \mathcal{F}}\), that is, if \(\alpha \cos(x) + \beta \sin(x)\) is zero for all \(x\), then \(\alpha\) and \(\beta\) must both be 0.

What the examples tell you is that most of the time, deciding whether a sequence of vectors is linearly independent is equivalent to seeing whether a system of linear equations has only the solution where every variable is zero. You’ve already learned how to solve these systems of linear equations!

**Definition:** let \(V\) be an \(\mathbb{F}\)-vector space and \(\mathbf{v}_1,\ldots, \mathbf{v}_n \in V\). The **span** of \(\mathbf{v}_1,\ldots, \mathbf{v}_n\), written \(\operatorname{span}(\mathbf{v}_1,\ldots, \mathbf{v}_n)\) is the set of all linear combinations of \(\mathbf{v}_1,\ldots, \mathbf{v}_n\).

In symbols,

\[\operatorname{span}(\mathbf{v}_1, \ldots, \mathbf{v}_n) = \{ \lambda_1 \mathbf{v}_1 + \cdots + \lambda_n \mathbf{v}_n : \lambda_1, \ldots, \lambda_n \in \mathbb{F} \}.\]

For technical reasons we define the span of the empty sequence of vectors to be \(\{0\}\).

To understand the definition a bit better, let’s look at two simple special cases. The span of a single element \(\mathbf{s}\) of an \(\mathbb{F}\)-vector space \(V\) is \(\{ \lambda \mathbf{s} : \lambda \in \mathbb{F}\}\), since any linear combination of \(\mathbf{s}\) is just a scalar multiple of \(\mathbf{s}\). The span of two elements \(\mathbf{u}, \mathbf{v}\) of \(V\) is \(\{ a \mathbf{u} + b \mathbf{v} : a, b, \in \mathbb{F}\}\).

**Proposition:** if \(\mathbf{s}_1,\ldots,\mathbf{s}_n\) are elements of a vector space \(V\) then \(\operatorname{span}(\mathbf{s}_1,\ldots,\mathbf{s}_n)\) is a subspace of \(V\).

*Proof:* Write \(S\) for \(\operatorname{span} \{ \mathbf{s} _1,\ldots, \mathbf{s} _n\}\). Recall that \(S\) consists of every linear combination \(\sum_{i=1}^n \lambda_i\mathbf{s}_i\), where the \(\lambda_i\) are scalars.

- \(S\) contains the zero vector because it contains \(\sum_{i=1}^n 0\mathbf{s}_i\), and each \(0\mathbf{s}_i\) is the zero vector.
- \(S\) is closed under addition because if \(\sum_{i=1}^n \lambda _i \mathbf{s}_i\) and \(\sum _{i=1}^n \mu_i \mathbf{s}_i\) are any two elements of \(S\) then \[\sum_{i=1}^n \lambda_i \mathbf{s}_i + \sum_{i=1}^n \mu_i \mathbf{s}_i = \sum_{i=1}^n (\lambda_i + \mu_i) \mathbf{s}_i\] is in \(S\).
- \(S\) is closed under scalar multiplication because if \(\sum_{i=1}^n \lambda_i \mathbf{s}_i\) is in \(S\) and \(\lambda\) is a scalar then \[ \lambda \sum_{i=1}^n \lambda_i \mathbf{s}_i = \sum_{i=1}^n (\lambda \lambda_i) \mathbf{s}_i\] is also in \(S\).

\(S\) fulfils all three conditions in the definition of subspace, so \(S \leq V\).

**Definition:** elements \(\mathbf{v}_1,\ldots,\mathbf{v}_n\) of a vector space \(V\) are a **spanning sequence** for \(V\) if and only if \(\operatorname{span}(\mathbf{v}_1,\ldots, \mathbf{v}_n) = V\).

The term *spanning set* is also used.

We also say “\(\mathbf{v}_1,\ldots,\mathbf{v}_n\) spans \(V\)” instead of “…is a spanning sequence for \(V\).”

Often deciding whether or not a sequence of vectors is a spanning sequence is equivalent to solving some linear equations. If you want to check whether \(\begin{pmatrix} 1\\1 \end{pmatrix}\) and \(\begin{pmatrix} -1\\1 \end{pmatrix}\) are a spanning sequence for \(\mathbb{R}^2\), what you need to do is to verify that for every \(\begin{pmatrix} x\\y \end{pmatrix}\in \mathbb{R}^2\) there are real numbers \(\alpha\) and \(\beta\) such that

\[\alpha \begin{pmatrix} 1\\1 \end{pmatrix} + \beta \begin{pmatrix} -1 \\1 \end{pmatrix} = \begin{pmatrix} x \\ y \end{pmatrix}.\]

In other words, you have to prove that for **every** \(x, y \in \mathbb{R}\) the system of linear equations

\[\begin{align*} \alpha + \beta &= x\\ \alpha - \beta &= y \end{align*}\]

has a solution. That’s easy in this case, because you can just notice that \(\alpha = (x+y)/2, \beta = (x-y)/2\) is a solution, but for bigger and more complicated systems you can use the method of RREF.

**Examples.**

- \(\begin{pmatrix}1\\ 1\end{pmatrix}\) and \(\begin{pmatrix}-1\\1\end{pmatrix}\) are a spanning sequence for \(\mathbb{R}^2\), as we have just seen.
- Let’s try to determine whether \(\begin{pmatrix}1\\-1\\0\end{pmatrix}, \begin{pmatrix}0\\1\\-1\end{pmatrix}, \begin{pmatrix}1\\0\\-1\end{pmatrix}\) are a spanning sequence for \(\mathbb{R}^3\). We need to find out whether it’s true that for any \(\begin{pmatrix} x\\y\\z \end{pmatrix} \in \mathbb{R}^3\) there exist \(\alpha, \beta, \gamma \in \mathbb{R}^3\) such that

\[\alpha \begin{pmatrix} 1\\-1\\0 \end{pmatrix} + \beta \begin{pmatrix} 0\\1\\-1 \end{pmatrix} + \gamma \begin{pmatrix} 1\\0\\-1 \end{pmatrix} = \begin{pmatrix} x\\y\\z \end{pmatrix}.\]

This is equivalent to asking whether for every \(x, y, z\) the simultaneous equations

\[\begin{align*} \alpha + \gamma &= x \\ -\alpha + \beta &= y\\ -\beta -\gamma &= z \end{align*}\]

have a solution. Again, in this special case you might just notice that (adding the three equations) there is no solution unless \(x+y+z=0\), so this collection of vectors is **not** a spanning set. In general, to find out if a system of linear equations has a solution you can put the augmented matrix into row reduced echelon form. In this case the augmented matrix is

\[\begin{pmatrix} 1 & 0 & 1 & x \\ -1 & 1 & 0 & y \\ 0 & -1 & -1 & z \end{pmatrix} \]

Doing the row operations \(r_2 \mapsto r_2 + r_1\) followed by \(r_3 \mapsto r_3 + r_2\) leads to

\[\begin{pmatrix} 1 & 0 & 1 & x \\ 0 & 1 & 1 & y+x \\ 0 & 0 & 0 & z+y+x \end{pmatrix}\]

These equations have no solutions if \(x+y+z\neq 0\). We can’t solve the equations for every value of \(x, y, z\), so the vectors are not a spanning set.

**Definition:** a sequence \(\mathbf{v}_1,\ldots, \mathbf{v}_n\) of elements of a vector space \(V\) is a **basis** for \(V\) if and only if

- it is linearly independent, and
- it is a spanning sequence for \(V\).

Importantly, bases are sequences not sets. This is because the order of a basis matters to some of the definitions we will make later - like the matrix of a linear map.

The most important example is the **standard basis** of \(\mathbb{F}^n\) (no matter which field \(\mathbb{F}\) is). Let \(\mathbf{e}_i\) be the column vector in \(\mathbb{F}^n\) with a 1 in position \(i\) and 0s elsewhere. When \(n=3\), for example, we have

\[\mathbf{e}_1 = \begin{pmatrix} 1\\0\\0 \end{pmatrix}, \mathbf{e}_2 = \begin{pmatrix} 0\\1\\0 \end{pmatrix}, \mathbf{e}_3 = \begin{pmatrix} 0\\0\\1 \end{pmatrix}.\]

Then \(\mathbf{e}_1,\ldots, \mathbf{e}_n\) is a basis of \(\mathbb{F}^n\), called the standard basis. To check this, we must verify the two parts of the definition of basis.

- (Linear independence). Suppose \(\sum_{i=1}^n \lambda_i \mathbf{e}_i = \mathbf{0}\). To verify linear independence we have to prove all the \(\lambda_i\) are zero. Using the definition of the \(\mathbf{e}_i\) we get \(\begin{pmatrix}\lambda_1\\\vdots\\ \lambda_n\end{pmatrix} = \begin{pmatrix} 0\\ \vdots \\ 0 \end{pmatrix}\). So \(\lambda_i =0\) for all \(i\) as required.
- (Spanning) We have to show that any element of \(\mathbb{F}^n\) is a linear combination of \(\mathbf{e}_1,\ldots, \mathbf{e}_n\). Let \(\mathbf{v}=\begin{pmatrix}v_1\\ \vdots \\ v_n\end{pmatrix} \in \mathbb{F}^n\). Then \(\mathbf{v} = \sum_{i=1}^n \mathbf{v}_i \mathbf{e}_i\), so \(\mathbf{v}\) is a linear combination of the \(\mathbf{e}_i\) as required.

\(\mathbb{R}_{\leq 3}[x]\) consists of all polynomials of degree at most 3 in the variable x. It has a basis \(1, x, x^2, x^3\), because

- (linear independence) if \(a + bx + cx^2 + dx^3\) is the zero polynomial, that is if it is zero for every value of \(x\), then \(a=b=c=d=0\). This is because a polynomial of degree \(m\) has at most \(m\) roots.
- (spanning) every polynomial of degree at most 3 has the form \(a +bx +cx^2 + dx^3\) for some \(a,b,c,d\), and so is visibly a linear combination of \(1, x, x^2, x^3\).

Let \(V = M_{m\times n}( \mathbb{F} )\) be the \(\mathbb{F}\)-vector space of all \(m\times n\) matrices. Let \(E_{ij}\) be the matrix which as a 1 in position \(i, j\) and zeroes elsewhere. Then

\[E_{11}, E_{21}, \ldots, E_{n1}, E_{12}, E_{22}, \ldots E_{mn}\]

is a basis for \(V\). This can be proved in exactly the same way as we proved that the standard basis of \(\mathbb{F}^n\) really was a basis.

**Lemma:** if \(\mathbf{v}_1,\ldots, \mathbf{v}_n\) is a basis of \(V\), every \(\mathbf{v} \in V\) can be written *uniquely* as \(\sum_{i=1}^n \lambda_i \mathbf{v}_i\) for some scalars \(\lambda_i\).

*Proof:* Every \(\mathbf{v} \in V\) can be written this way because the \(\mathbf{v}_i\) are a basis and hence a spanning sequence for \(V\). The problem is to prove that every \(\mathbf{v} \in V\) can be written like this in *only one way*.

Suppose

\[\sum_{i=1}^n \lambda_i \mathbf{v}_i = \sum_{i=1}^n \mu_i \mathbf{v}_i.\]

Then subtracting one side from the other,

\[\sum_{i=1}^n (\lambda_i-\mu_i) \mathbf{v}_i = 0_V.\]

Linear independence of the \(\mathbf{v}_i\) tells us \(\lambda_i - \mu_i = 0\) for all \(i\), so \(\lambda_i=\mu_i\) for all \(i\). We have proved that there is only one expression for \(\mathbf{v}\) as a linear combination of the elements of the basis \(\mathbf{v}_1, \ldots, \mathbf{v}_n\).

This means that a basis gives a way of “coordinatizing” an arbitrary vector space - no matter what the elements look like. Once we fix a basis of \(V\), there is a one-one correspondence between the elements of \(V\) and the coefficients needed to express them in terms of that basis - you could call these the “coordinates” of the vector in terms of this basis.

It also allows us to “compare coefficients”. Suppose \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) is a basis of a vector space \(V\) (actually we only need linear independence) and that

\[\lambda_1 v_1 + \cdots + \lambda_n v_n = \mu_1 v_1 + \cdots + \mu_n v_n.\]

Then the uniqueness result we just saw tells us we can “compare coefficients” to get that \(\lambda_1 = \mu_1\), \(\lambda_2 = \mu_2\), and so on.

A given vector space can have many different bases. This is true in a trivial sense: as I said before, basis are sequences, the order matters, so \(\mathbf{e}_n, \mathbf{e}_{n-1}, \ldots, \mathbf{e}_1\) is different to \(\mathbf{e}_1,\ldots, \mathbf{e}_n\) but clearly still a basis - all the arguments above work. But it is also true in a more interesting way. Take \(\mathbb{R}^2\), for example: we know \(\mathbf{e}_1, \mathbf{e}_2\) is a basis, but so also is

\[\mathbf{u}= \begin{pmatrix}1\\1\end{pmatrix}, \mathbf{v}=\begin{pmatrix}1\\-1\end{pmatrix}\]

Let’s check this. Suppose \(a\mathbf{u}+b\mathbf{v}=0\). Then \(\begin{pmatrix} a+b\\a-b \end{pmatrix}= \begin{pmatrix} 0\\0 \end{pmatrix}\), so \(a+b=0=a-b\) from which it follows \(a=b=0\) and \(\mathbf{u},\mathbf{v}\) is linearly independent. To show \(\mathbf{u},\mathbf{v}\) spans \(\mathbb{R}^2\), let \(\begin{pmatrix} x\\y \end{pmatrix} \in \mathbb{R}^2\). We must show there exist \(a,b \in \mathbb{R}\) such that \(a\mathbf{u}+b\mathbf{v}= \begin{pmatrix} x\\y \end{pmatrix}\). The condition \(a\) and \(b\) must satisfy is \(a+b=x, a-b=y\). It is always possible to find such \(a\) and \(b\): solving the equations you get \(a=(x+y)/2, b=(x-y)/2\), so \(\mathbf{u},\mathbf{v}\) spans \(\mathbb{R}^2\).

Here’s why a vector space having several different bases is useful. The expression of an element \(\mathbf{v}\) in terms of different bases can tell us different things about \(\mathbf{v}\). In other words different bases give different ways of looking at the elements of the vector space.

Say for example you are representing an image as an element of \(\mathbb{R}^n\). The smallest possible example is a 2-pixel image which we could represent as an element \(\begin{pmatrix}a \\b\end{pmatrix}=a\mathbf{e}_1+b\mathbf{e}_2\) in \(\mathbb{R}^2\), where the first coordinate tells me how bright the first pixel is and the second tells me how bright the second is.

Now consider the alternative basis \(\mathbf{e}_1+\mathbf{e}_2, \mathbf{e}_1-\mathbf{e}_2\). Any image \(a\mathbf{e}_1+b\mathbf{e}_2\) can be re-written in terms of the new basis:

\[a\mathbf{e}_1 + b\mathbf{e}_2 = \frac{a+b}{2} (\mathbf{e}_1+\mathbf{e}_2) + \frac{a-b}{2} (\mathbf{e}_1-\mathbf{e}_2).\]

So the new basis is giving us a different description of the image. It tells us how bright the image is overall (the coefficient \((a+b)/2\) of \(\mathbf{e}_1+\mathbf{e}_2\) is the average brightness of the two pixels, so it measures the overall image brightness) and how different in brightness the two pixels are (the coefficient \((a-b)/2\) of \(\mathbf{e}_1-\mathbf{e}_2\) is a measure of how different the brightnesses \(a\) and \(b\) of the two pixels are).

We are going to define the **dimension** of a finite-dimensional vector space \(V\) as the size of a basis of \(V\). But as we’ve seen, a vector space can have many different bases. So we have some proving to do before this definition makes sense - we need to know that any two bases have the same size.

**Lemma**: if \(\mathbf{s}_1, \ldots, \mathbf{s}_m\) is a spanning sequence for a vector space \(V\) and \(\mathbf{v} = \sum_{i=1}^m \lambda_i \mathbf{s}_i\) where \(\lambda_j\neq 0\), then \[\mathbf{s}_1, \ldots, \mathbf{s}_{j-1}, \mathbf{v}, \mathbf{s}_{j+1},
\ldots, \mathbf{s}_m\] is also a spanning sequence for \(V\).

*Proof* Certainly \(S = \operatorname{span}(\mathbf{s}_1, \ldots, \mathbf{s}_{j-1}, \mathbf{v}, \mathbf{s}_{j+1}, \ldots, \mathbf{s}_m)\) contains all the \(\mathbf{s}_i\) except possibly \(\mathbf{s}_j\). In fact it contains \(\mathbf{s}_j\) too, as

\[ \mathbf{s}_j = \lambda_j^{-1} \mathbf{v} - \lambda_j^{-1} \sum _{i\neq j}\lambda_i \mathbf{s}_i\]

Since \(S\) contains every \(\mathbf{s}_i\) and is a subspace, it contains every linear combination of the \(\mathbf{s}_i\). Since the \(\mathbf{s}_i\) spanned \(V\), we get \(V \leq S\) so \(S=V\).

**Theorem.** Let \(V\) be a vector space and suppose \(\mathbf{s}_1,\ldots, \mathbf{s}_m\) spans \(V\) and \(\mathbf{l}_1,\ldots, \mathbf{l}_n\) are linearly independent elements of \(V\). Then \(m \geq n\).

*Proof.* Since the \(\mathbf{s}_i\) span, we can write \(\mathbf{l}_1 = \sum_{i=1}^m \lambda_i \mathbf{s}_i\). Not all the \(\lambda_i\) can be zero as a linearly independent set can’t contain the zero vector, so without loss of generality - by renumbering if necessary - \(\lambda_1 \neq 0\). So \(\mathbf{l}_1, \mathbf{s}_2, \ldots, \mathbf{s}_m\) is a spanning set for \(V\) by the spanning lemma.

Now \(\mathbf{l}_2 = \mu_1 \mathbf{l}_1 + \sum_{i\geq 2} \lambda_i \mathbf{s}_i\) for some scalars \(\mu_1, \lambda_i\). It can’t be \(\lambda_i=0\) all \(i\) because this would contradict the \(\mathbf{l}_i\) being linearly independent. So again without loss of generality \(\lambda_2\neq 0\), and \(\mathbf{l}_1, \mathbf{l}_2, \mathbf{s}_3, \ldots, \mathbf{s}_m\) is a spanning sequence.

If \(m < n\) then we eventually end up with \(\mathbf{l}_1, \ldots, \mathbf{l}_m\) being a spanning sequence, so \(\mathbf{l}_{m+1} \in \operatorname{span}(\mathbf{l}_1,\ldots, \mathbf{l}_m)\). This contradicts the \(\mathbf{l}_i\) being linearly independent, so we’re done.

To make life slightly easier, we are going to work only with “finite-dimensional” vector spaces. A vector space is called **finite-dimensional** if it contains a finite spanning sequence.

**Theorem.** All bases of a finite-dimensional vector space \(V\) have the same size.

*Proof.* \(V\) has a finite spanning sequence \(\mathbf{s}_1, \ldots, \mathbf{s}_m\) because it is finite-dimensional. Therefore every linearly independent set has size at most \(m\), so is finite, so every basis is finite. (We haven’t actually shown that a basis exists, but this will follow from something we prove later).

Now if \(\mathbf{b}_1,\ldots, \mathbf{b}_k\) and \(\mathbf{c}_1, \ldots, \mathbf{c}_l\) are bases of \(V\) then \(k\leq l\) (as the \(\mathbf{b}_i\) are linearly independent and the \(\mathbf{c}_i\) span) and \(l \leq k\) (as the \(\mathbf{c}_i\) are linearly independent and the \(\mathbf{b}_i\) span). Therefore \(k=l\).

Now that we know any two bases have the same size, we can make our definition of dimension:

**Definition**: the **dimension** \(\dim V\) of a vector space \(V\) is the size of any basis of \(V\).

There’s a special case: the dimension of the zero vector space \(\{0\}\) is defined to be 0. If you want you can talk yourself into believing that the empty set is a basis of the zero vector space, so that this is covered by the definition above, but it’s easier just to think of this as a special case.

Roughly speaking the dimension of an \(\mathbb{F}\)-vector space \(V\) tells you how many independent parameters (from \(\mathbb{F}\)!) you need to specify an element of \(V\).

We’ve already seen a couple of examples. Let’s recap the most important one: the **standard basis** of \(\mathbb{F}^n\), the space of height n column vectors with entries in \(\mathbb{F}\). This standard basis was \(e_1,\ldots, e_n\) where \(e_i\) is the height n column vector with a 1 in position i and 0s elsewhere. The basis has size n, so \(\dim \mathbb{F}^n = n\).

We can do a similar thing for the vector space of all m by n matrices over a field \(\mathbb{F}\). Let \(E_{ij}\) be the m by n matrix with a 1 in position i,j and 0s elsewhere. Then the \(E_{ij}\), for \(1\leq i \leq m\), \(1 \leq j \leq n\) are a basis of \(M_{m\times n}( \mathbb{F} )\), which therefore has dimension \(mn\).

The trace of a matrix is the sum of the elements of its leading diagonal. Find a basis of the set \(S\) of \(2\times 2\) matrices with trace zero.

First note that this really is a vector space (a subspace of \(M_{2\times 2}( \mathbb{F} )\)), so its dimension is at most 4. In fact it must be less than 4, since the subspace is proper.

A good start is to write down an expression for a general matrix with trace zero. It must have the form \(\begin{pmatrix} a & b \\ c& -a \end{pmatrix}\)

Then notice this matrix can be written \[a \begin{pmatrix} 1&0\\0&-1\end{pmatrix} + b \begin{pmatrix} 0&1\\0&0 \end{pmatrix} + c \begin{pmatrix} 0&0 \\ 1&0\end{pmatrix}\]

Call the three matrices above \(H, E, F\) so that our expression was \(aH +bE + cF\). Since \(H, E, F\) are in \(S\), they are a spanning sequence for \(S\). You can check that they’re linearly independent, so they are a basis and \(\dim S = 3\).

\(\dim \mathbb{R}_{\leq n}[x] = n+1\), because \(1, x, \ldots, x^{n}\) is a basis.

Let \(S = \operatorname{span}(\sin, \cos)\), a subspace of the \(\mathbb{R}\)-vector space of all functions \(\mathbb{R} \to \mathbb{R}\). What is \(\dim S\)?

We know \(S\) has a spanning sequence of size 2. The size of a basis is at most the size of a spanning sequence, because a basis is linearly independent, so \(\dim S \leq 2\). \(S\) is manifestly not the zero vector space, so its dimension isn’t zero, so \(\dim S\) is 1 or 2.

Could it be 1? Then \(S\) has a basis of size 1. That basis is just a single element of \(S\), call it \(f\). Both \(\sin\) and \(\cos\) are elements of \(S\), so they are linear combinations of this basis, in other words, multiples of \(f\). So \(\cos\) is a multiple of \(\sin\) say \(\cos = \alpha \sin\) for some \(\alpha \in \mathbb{R}\).

This is clearly false. If two functions are equal, they have the same input for every output. Putting \(x=0\) gives \(1=0\), a contradiction. Therefore \(\dim S\) is 2.

Our goal in this video is to show that every linearly independent sequence in a finite-dimensional vector space can be extended, by adding some more vectors to the sequence, to a basis.

The following lemma is going to be useful.

**Lemma:** (the *Extension Lemma*) if \(\mathbf{v}_1,\ldots,\mathbf{v}_n\) are linearly independent and \(\mathbf{u} \notin \operatorname{span}(\mathbf{v}_1,\ldots, \mathbf{v}_n)\) then \(\mathbf{v}_1,\ldots, \mathbf{v}_n, \mathbf{u}\) is linearly independent.

*Proof:* We’ll prove the contrapositive: if \(\mathbf{v} _1,\ldots, \mathbf{v}_n, \mathbf{u}\) is linearly dependent then \(\mathbf{u} \in \operatorname{span} \{ \mathbf{v} _1,\ldots, \mathbf{v}_n \}\).

Suppose \(\mathbf{v} _1,\ldots, \mathbf{v} _n,\mathbf{u}\) is linearly dependent. There are scalars \(\lambda,\lambda_1,\ldots,\lambda_n\) not all zero such that

\[\lambda \mathbf{u} + \sum_{i=1}^n \lambda_i \mathbf{v}_i = \mathbf{0}_V.\]

\(\lambda\) can’t be zero, for then this equation would say that \(\mathbf{v} _1,\ldots, \mathbf{v} _n\) was linearly dependent. Therefore we can rearrange to get

\[\mathbf{u} = -\lambda^{-1} \sum_{i=1}^n \lambda_i \mathbf{v}_i=\sum_{i=1}^n (-\lambda^{-1} \lambda_i)\mathbf{v}_i \in \operatorname{span} \{ \mathbf{v} _1,\ldots, \mathbf{v} _n\}\]

as required.

**Proposition:** Let \(V\) be finite-dimensional and let \(\mathbf{l} _1,\ldots, \mathbf{l}_n\) be linearly independent. Then there is a basis of \(V\) containing \(\mathbf{l} _1,\ldots, \mathbf{l} _n\).

*Proof*: Let \(\mathcal{L}= \mathbf{l} _1,\ldots, \mathbf{l} _n\). Since \(V\) is finite-dimensional there are elements \(\mathbf{v}_1,\ldots,\mathbf{v}_m\) of \(V\) that span \(V\).

Define a sequence of sequences of elements of \(V\) as follows: \(\mathcal{S}_0 = \mathcal{L}\), and for \(i\geq 0\),

\[\mathcal{S}_{i+1}= \begin{cases} \mathcal{S}_i & \text{if }\mathbf{v}_{i+1} \in \operatorname{span} \mathcal{S}_i \\ \mathcal{S}_i , \mathbf{v}_{i+1} & \text{otherwise.} \end{cases}\]

Here “\(\mathcal{S}_i, \mathbf{v}_{i+1}\)” just means take the sequence \(\mathbf{S}_i\) and add \(\mathbf{v}_{i+1}\) on to the end.

Note that in either case \(\mathbf{v}_{i+1} \in \operatorname{span} \mathcal{S}_{i+1}\), and also that \(\mathcal{S}_0 \subseteq \mathcal{S}_1 \subseteq \cdots \subseteq \mathcal{S}_m\).

Each sequence \(\mathcal{S}_i\) is linearly independent by the extension lemma, and in particular \(\mathcal{S}_m\) is linearly independent. Furthermore \(\operatorname{span} \mathcal{S}_m\) contains the spanning sequence \(\{\mathbf{v}_1,\ldots, \mathbf{v}_m\}\) because for each \(i\) we have \(\mathbf{v}_i \in \operatorname{span} \mathcal{S}_i \subseteq \operatorname{span} \mathcal{S}_m\), so since subspaces are closed under taking linear combinations, \(\operatorname{span} \mathcal{S}_m = V\). Therefore \(\mathcal{S}_m\) is a basis containing \(\mathcal{L}\). This completes the proof.

As a corollary, we can prove that every finite-dimensional vector space has a basis. Start with any nonzero vector you like - this forms a linearly independent sequence of length 1. The above result lets us extend that to a basis, and in particular, a basis exists.

Consider the sequence of elements \(\mathcal{L} = \mathbf{l}_1, \mathbf{l}_2\) where \(\mathbf{l}_1 = (0, 1, 1, 0)\), \(\mathbf{l}_2 = (1, 0, 1, 0)\) of the vector space \(V\) of all width 4 row vectors with real number entries. It’s easy to check that they are linearly independent. We are going to use the procedure above, together with the spanning set

\[\mathbf{v}_1 = (1, 0, 0, 0), \mathbf{v}_2, = (0, 1, 0, 0)\]

\[\mathbf{v}_3 = (0, 0, 1, 0), \mathbf{v}_4 = (0, 0, 0, 1)\]

of \(V\) to produce a basis of \(V\) containing \(\mathcal{L}\).

We begin with the sequence \(\mathcal{S}_0 = \mathcal{L}\). To find \(\mathcal{S}_1\) we have to determine if \(\mathbf{v}_1 \in \operatorname{span} \mathcal{S}_0\). It isn’t (to see this, show that the system of linear equations

\[\mathbf{v}_1 = a \mathbf{l}_1 + b \mathbf{l}_2\]

has no solutions), so \(\mathbf{S}_1 = \mathbf{S}_0, \mathbf{v}_1 = \mathbf{l}_1, \mathbf{l}_2, \mathbf{v}_1\).

To find \(\mathbf{S}_2\) we have to determine if \(\mathbf{v}_2 \in \operatorname{span} \mathcal{S}_2\). It is, because

\[\mathbf{v}_2=(0, 1, 0, 0) = \mathbf{l}_1-\mathbf{l}_2+ \mathbf{v}_1\]

so \(\mathcal{S}_2 = \mathcal{S}_1\).

To find \(\mathcal{S}_3\) we have to determine if \(\mathbf{v}_3 \in \operatorname{span} \mathcal{S}_3\). It is, because

\[\mathbf{v}_3 = \mathbf{l}_2 - \mathbf{v}_1\]

so \(\mathcal{S}_3 = \mathcal{S}_2\).

Finally to find \(\mathcal{S}_4\) we have to determine if \(\mathbf{v}_4 \in \operatorname{span} \mathcal{S}_3\). It is not (this is clear because no linear combination of \(\mathcal{S}_3\) can have a nonzero entry in the last position), so \(\mathcal{S}_4 = \mathcal{S}_3, \mathbf{v}_4\). We have run out of \(\mathbf{v}_i\)s, so \(\mathcal{S}_4\) is the required basis containing \(\mathcal{L}\).

In this video we’re going to prove some helpful results on dimensions.

**Theorem:** any sequence of \(n+1\) elements in a vector space of dimension \(n\) are linearly **de**pendent.

*Proof:* if they were linearly independent, we could extend them to a basis of size larger than \(n\) using the proposition from the last lecture, contradicting that every basis has size \(n\).

So, for example, if I give you 4 vectors in \(\mathbb{R}^3\) you know they *must* be linearly dependent.

**Proposition:** if \(U \leq V\) then

- \(\dim U \leq \dim V\), and
- if \(\dim U = \dim V\) then \(U=V\).

*Proof:*

- A basis of \(U\) is a linearly independent set in \(V\), so we can extend it to a basis of \(V\). So its size is less than or equal to the size of a basis of \(V\).
- Let \(\dim V = n\) and suppose \(\mathbf{u}_1,\ldots, \mathbf{u}_n\) be a basis of \(U\).
- The span \(S = \operatorname{span}(\mathbf{u}_1,\ldots,\mathbf{u}_n)\) (which is \(U\), of course) is a subspace of \(V\).
- Suppose for a contradiction that \(S \neq V\), and let \(\mathbf{v}\) be an element of \(V\) not in \(S\).
- Then \(\mathbf{u}_1,\ldots, \mathbf{u}_n, \mathbf{v}\) is linearly independent (by the extension lemma, video 54), which is a contradiction (to the Theorem at the start of this lecture)

Consider two sets \(X\) and \(Y\). What’s the size of \(X\cup Y\) in terms of the size of \(X\) and the size of \(Y\)? It isn’t \(|X|+|Y|\), in general, because elements belonging to \(X\) *and* \(Y\) get counted twice when you add the sizes like this. The correct answer is \(|X| + |Y| - |X \cup Y|\). We would like a similar result for sums of subspaces.

**Theorem:** Let \(V\) be a vector space and \(X, Y \leq V\). Then \[\dim (X+Y) = \dim X + \dim Y - \dim X \cap Y.\]

*Proof:* take a basis \(\mathcal{I} = \mathbf{i}_1,\ldots, \mathbf{i}_k\) of \(X \cap Y\). Extend \(\mathcal{I}\) to a basis \(\mathcal{X} = \mathbf{i}_1, \ldots, \mathbf{i}_k, \mathbf{x}_1, \ldots, \mathbf{x}_n\) of \(X\). Extend \(\mathcal{I}\) to a basis \(\mathcal{Y} = \mathbf{i}_1, \ldots, \mathbf{i}_k, \mathbf{y}_1, \ldots,\mathbf{y}_m\) of \(Y\). It’s now enough to prove that \(\mathcal{J} = \mathbf{i}_1,\ldots, \mathbf{i}_k, \mathbf{x}_1, \ldots, \mathbf{x}_n, \mathbf{y}_1, \ldots, \mathbf{y}_m\) is a basis of \(X+Y\), because the size of \(\mathcal{J}\), which is \(k+n+m\), equals the size of a basis of \(\mathcal{I}\) (which is \(k+n\)) plus the size of a basis of \(Y\) (which is \(k + m\)) minus the size of a basis of \(X \cap Y\) (which is \(k\)).

To check something is a basis for \(X+Y\), as always, we must check that it is a spanning set for \(X+Y\) and that is it linearly independent.

*Spanning:* let \(\mathbf{x} +\mathbf{y} \in X+Y\), where \(\mathbf{x} \in X, \mathbf{y} \in Y\). Then there are scalars such that

\[\begin{align*} \mathbf{x} &= \sum_{j=1}^k \lambda_j \mathbf{i}_j + \sum_{j=1}^n \xi_j \mathbf{x}_j \\ y &= \sum_{j=1}^k \mu_j \mathbf{i}_j + \sum_{j=1}^m \upsilon _j \mathbf{y}_j \end{align*}\]

and so

\[\mathbf{x} + \mathbf{y} = \sum_{j=1}^k (\lambda_j + \mu_j) \mathbf{i}_j + \sum_{j=1}^n \xi_j \mathbf{x}_j + \sum_{j=1}^m \upsilon_j \mathbf{y}_j\]

*Linear independence:* suppose

\[\sum_{j=1}^k \lambda_j \mathbf{i}_j + \sum_{j=1}^n \xi_j \mathbf{x}_j + \sum_{j=1}^m \upsilon_j \mathbf{y}_j = 0\]

Rearrange it:

\[\sum_{j=1}^k \lambda_j \mathbf{i}_j + \sum_{j=1}^n \xi_j \mathbf{x}_j =- \sum_{j=1}^m \upsilon_j \mathbf{y}_j\]

The left hand side is in \(X\) and the right hand side is in \(Y\). So both sides are in \(X \cap Y\), in particular, the right hand side is in \(X\cap Y\). Since \(\mathcal{I}\) is a basis of \(X\cap Y\), there are scalars \(\gamma_j\) such that

\[\sum_{j=1}^k \gamma_j \mathbf{i}_j = - \sum_{j=1}^m \upsilon_j \mathbf{y}_j\]

But this is a linear dependence on the elements of the basis \(\mathcal{Y}\). So all the \(\upsilon_j\) are 0. Similarly all the \(\xi_j\) are 0. So the \(\lambda_j\) are 0 too, and we have linear independence.

We’re going to look at the following problem:

Given \(\mathbf{v}_1,\ldots, \mathbf{v}_n \in \mathbb{F}^m\), find a basis for \(S = \operatorname{span}(\mathbf{v}_1,\ldots, \mathbf{v}_n)\).

The sequence \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) might not be a basis because it might be linearly dependent.

Here’s a systematic method.

- Write the vectors as rows, for convenience. (Turning them into rows lets us use row ops. We could keep them as columns and use column ops, but I don’t want to spend the time defining column ops when we already have row ops.). Put them into a matrix \(A\).
- Do row ops to \(A\) until you obtain a matrix in RREF.
- The nonzero rows are then a basis for \(S\).

Why does it work?

Observe that doing a row op to a matrix \(A\) doesn’t change the span of its rows - the **row space** of \(A\):

- \(r_i \leftrightarrow r_j\) - the order of the rows doesn’t affect what their span is.
- \(r_i \mapsto \lambda r_i\) - every element of the span of \(r_1, \ldots, r_n\) is an element of the span of \(r_1, \ldots, \lambda r_i, \ldots, r_n\) and vice versa because \[\begin{multline*} a_1 r_1 + \cdots + a_i r_i + \cdots a_n r_n \\= a_1 r_1 + \cdots + (a_i \lambda^{-1})\lambda r_i + \cdots + a_nr_n.\end{multline*}\]
- \(r_i \mapsto r_i + \lambda r_j\) - this will be on a problem sheet.

Using this repeatedly, the span of the rows of \(A\), which is the span we wanted a basis for, equals the span of the rows (“row space”) of the RREF of \(A\).

BUT: the nonzero rows of a RREF matrix are clearly linearly independent (consider the columns with leading entries). Therefore they’re a basis for the row space of \(A\).

Find a basis for the span \(S\) of \(\begin{pmatrix}1\\ 2\\ 3\end{pmatrix}, \begin{pmatrix}4\\ 5\\ 6\end{pmatrix}, \begin{pmatrix}7, 8, 9\end{pmatrix}\).

Following the procedure above,…

- the matrix with these vectors as
**rows**is \(A = \begin{pmatrix} 1&2&3\\4&5&6\\7&8&9 \end{pmatrix}\). - we do row ops to \(A\) until we get a matrix in RREF: you get \(\begin{pmatrix} 1&0&-1 \\ 0&1&2 \\0&0&0 \end{pmatrix}\).
- the nonzero rows, thought of as column vectors are a basis of \(S\).

So \(\dim S = 2\) and a basis of \(S\) is \(\mathbf{u}_1 = \begin{pmatrix} 1\\0\\-2 \end{pmatrix},\mathbf{u}_2 \begin{pmatrix} 0\\1\\2 \end{pmatrix}\). You’ll notice that is is “obvious” that these two vectors are linearly independent because if \(a\mathbf{u}_1+b\mathbf{u}_2= \mathbf{0}\) then considering the first coordinate of \(a\mathbf{u}_1+b\mathbf{u}_2\) gives \(a=0\), and considering the second coordinate gives \(b=0\).

Let’s record the definition we already gave: given a \(m\times n\) matrix \(A\)

- the
**row space**of \(A\) is the span of the rows of \(A\) - the
**column space**of \(A\) is the span of the columns of \(A\)

Notice that the row space is a subspace of \(\mathbb{F}^m\) and the column space is a subspace of \(\mathbb{F}^n\).

Suppose that you have two finite sets \(X\) and \(Y\) and a function \(f:X \to Y\). If you know that \(f\) is onto then you get some information about \(X\) and \(Y\): you know that \(X\) must be at least as large as \(Y\).

But an arbitrary function between two vector spaces doesn’t necessarily give you any information about their relationship as vector spaces. To get such information, we need to restrict to functions that respect the vector space structure - that is, the scalar multiplication and the vector addition.

Functions with this property, which we’re going to define shortly, are called linear maps. They allow us to do something similar to the finite set example above: for example, if you have a surjective linear map from a vector space \(X\) to another vector space \(Y\), it is true that \(\dim X \geq \dim Y\).

**Definition:** let \(V\) and \(W\) be vector spaces over the same field \(\mathbb{F}\). A function \(T:V \to W\) is called a **linear map** or a **linear transformation** if

- for all \(l \in \mathbb{F}\) and all \(\mathbf{v} \in V\) we have \(T(l\mathbf{v})=lT(\mathbf{v})\), and
- for all \(\mathbf{v},\mathbf{w} \in V\) we have \(T(\mathbf{v}+\mathbf{w})=T(\mathbf{v}) + T(\mathbf{w})\).

Point 1 is what it means to say “\(T\) respects scalar multiplication” and point 2 is what it means to say “\(T\) respects vector addition.”

This concept is so common that it has many names. For us,

- “\(T\) is a linear map”
- “\(T\) is a linear function”
- “\(T\) is a linear transformation”
- “\(T\) is linear”

…all mean exactly the same thing, namely that \(T\) satisfies the definition we’ve just seen.

- For any vector space \(V\), the identity map \(\operatorname{id}:V \to V\) and the zero map \(z : V \to V\) given by \(z(v) = 0_V\) for all \(v \in V\) are linear.
- Let \(A\) be a \(m \times n\) matrix with entries in a field \(\mathbb{F}\). Then \(T_A : \mathbb{F}^n \to \mathbb{F}^m\) is linear.
- \(T : M_{n \times n} \to M_{n\times n}, T(A)=A^2\) is
**not**linear. - \(T: \mathbb{R}^n \to \mathbb{R}, T\begin{pmatrix}x_1\\ \vdots \\x_n\end{pmatrix} = \sum_{i=1}^n x_i\) is linear.
- \(D : \mathbb{R}_{\leq n}[x] \to \mathbb{R}_{\leq n}[x]\) given by \(D(f) = \frac{df}{dx}\) is linear.

Let’s look at why some of these are true, starting with example 3. To show that \(T_A\) is a linear map we have to check the two parts of the definition of being a linear map. Both of these are going to follow from properties of matrix multiplication and addition that you learned in the previous section.

- Let \(\mathbf{x} \in \mathbb{F}^n\) and \(\lambda \in \mathbb{F}\). Then \[ \begin{align*} T_A( \lambda \mathbf{x} ) &= A(\lambda \mathbf{x}) & \text{definition of } T_A \\ &= \lambda A \mathbf{x} & \text{matrix mult properties} \\ &= \lambda T_A(x) & \text{definition of } T_A\end{align*} \]
- Let \(\mathbf{x}, \mathbf{y} \in \mathbb{F}^n\). Then \[ \begin{align*} T_A( \mathbf{x}+ \mathbf{y} ) &= A( \mathbf{x} + \mathbf{y} ) & \text{definition of } T_A \\ &= A \mathbf{x} + A \mathbf{y} & \text{matrix mult properties} \\ &= T_A( \mathbf{x} ) + T_A ( \mathbf{y} ) & \text{definition of } T_A \end{align*}\]

The properties of matrix multiplication used can be found in the proposition in the video *03 Matrix multiplication 2* from the previous section of the course.

Similarly, the fact that the differentiation map \(D\) of example 5 is linear follows from standard properties of derivatives: you know, for example, that for any two functions (not just polynomials) \(f\) and \(g\) we have \(\frac{d}{dx}(f + g) = \frac{df}{dx} + \frac{dg}{dx}\), which shows that \(D\) satisfies the second part of the linearity definition.

As an example where the linearity of a map doesn’t just come from “standard facts” you already know, consider

\[T : \mathbb{R}^2 \to \mathbb{R}\;\;\; T\begin{pmatrix} x\\y \end{pmatrix} = 2x - y\]

To show \(T\) is linear we have to show that it has properties 1 and 2 from the definition.

**1.**

\[\begin{align*}T\left( \lambda \begin{pmatrix} x \\y \end{pmatrix} \right) &= T \begin{pmatrix} \lambda x \\ \lambda y \end{pmatrix} \\ & = 2\lambda x - \lambda y \\ &= \lambda (2x-y) \\ & = \lambda T \begin{pmatrix} x\\y \end{pmatrix} \end{align*}\]

**2.**

\[\begin{align*}T \left( \begin{pmatrix} x_1\\y_1 \end{pmatrix} + \begin{pmatrix} x_2 \\ y_2 \end{pmatrix} \right) & = T \begin{pmatrix} x_1+x_2\\y_1+y+2 \end{pmatrix} \\ & = 2(x_1+x_2) - (y_1+y_2) \\ &= (2x_1-y_1) + 2x_2-y_2 \\ &= T \begin{pmatrix} x_1\\y_1 \end{pmatrix} + T \begin{pmatrix} x_2\\y_2 \end{pmatrix}. \end{align*}\]

Here are some examples of things which are **not** linear maps:

- \(T: \mathbb{R} \to \mathbb{R}, T(x) = |x|\) isn’t linear. It doesn’t satisfy either linearity property. \(T(-2\cdot 3) \neq -2 \cdot T(3)\), and \(T(-1 + 1) \neq T(-1) + T(1)\).
- \(T: \mathbb{R}^3 \to \mathbb{R}^3, T( \mathbf{x} ) = \mathbf{x} + \begin{pmatrix} 1\\0\\0 \end{pmatrix}\). Again it doesn’t satisfy either part of the definition - you should check that.

**Lemma:** Let \(T: V \to W\) be a linear map. Then \(T(\mathbf{0}_V) = \mathbf{0}_W\).

*Proof.*

\[\begin{align*} T(\mathbf{0}_V) & = T(\mathbf{0}_V+\mathbf{0}_V)\\ &= T(\mathbf{0}_V)+T(\mathbf{0}_V)\end{align*}\]

by the first part of the definition of linearity. Now add \(-T(\mathbf{0}_V)\) to both sides:

\[\begin{align*} T(\mathbf{0}_V) - T(\mathbf{0}_V) &= T(\mathbf{0}_V) + T(\mathbf{0}_V) - T(\mathbf{0}_V) \\ \mathbf{0}_W &= T(\mathbf{0}_V) \end{align*}\]

as required.

To every linear transformation we associate two important subspaces.

**Definition:** let \(T:V \to W\) be linear.

- the
**kernel**of \(T\) is \(\ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}_W\}\) - the
**image**of \(T\) is \(\operatorname{im} (T) = \{T(\mathbf{v}) : \mathbf{v} \in V\}\)

In other words, the image is what we normally mean by the image of a function.

**Lemma:** \(\ker T \leq V, \operatorname{im} T \leq W\).

*Proof:* Remember that to show something is a subspace you check the three conditions: it contains the zero vector, it is closed under addition, it is closed under scalar multiplication.

\(\ker T \leq V\): To show that the kernel contains \(\mathbf{0}_V\), we must show that \(T(\mathbf{0}_V) = \mathbf{0}_W\). That’s exactly the lemma we just did.

Now closure under addition: if \(\mathbf{v}, \mathbf{w} \in \ker T\) then \(T(\mathbf{v}+\mathbf{w}) = T(\mathbf{v})+T(\mathbf{w}) = \mathbf{0}_W+\mathbf{0}_W=\mathbf{0}_W\), so \(\mathbf{v}+\mathbf{w} \in \ker T\).

Now closure under scalar multiplication: if \(\mathbf{v} \in \ker T\) and \(\lambda \in \mathbb{F}\) then \(T(\lambda \mathbf{v}) = \lambda T(\mathbf{v})\) by the second part of the definition of linearity, and this is \(\lambda \mathbf{0}_W\) which equals \(\mathbf{0}_W\). Since \(T(\lambda \mathbf{v}) = \mathbf{0}_W\), we have \(\lambda \mathbf{v} \in \ker T\).

We know from the lemma that \(T(\mathbf{0}_V)=\mathbf{0}_W\), so \(\mathbf{0}_W \in \operatorname{im} T\).

Now closure under addition: any two elements of \(\operatorname{im} T\) have the form \(T(\mathbf{u}), T(\mathbf{v})\) some \(\mathbf{u}, \mathbf{v} \in V\). Then \(T(\mathbf{u})+T(\mathbf{v}) = T(\mathbf{u}+\mathbf{v})\) (linearity definition part 1), which is an element if \(\operatorname{im}T\), so \(\operatorname{im}T\) is closed under addition.

Now closure under scalar multiplication: if \(T(\mathbf{u}) \in \operatorname{im}T\) and \(\lambda \in \mathbb{F}\) then \(\lambda T(\mathbf{u}) = T(\lambda \mathbf{u})\) by the definition of linearity part 2, and this is an element of \(\operatorname{im} T\) as it is \(T\) applied to something, so \(\operatorname{im}T\) is closed under scalar multiplication.

Let \(A = \begin{pmatrix} 0&1\\0&0 \end{pmatrix}\) so that we have a linear map \(T_A : \mathbb{R}^2 \to \mathbb{R}^2\) given by \(T_A( \mathbf{x} ) = A \mathbf{x}\). What are \(\operatorname{im}T_A\) and \(\ker T_A\)?

\[\begin{align*} \operatorname{im} T_A &= \left\{ T_A \begin{pmatrix} x\\y \end{pmatrix} : x, y \in \mathbb{R} \right\} \\ &= \left\{ \begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} x\\y \end{pmatrix} : x, y \in \mathbb{R} \right\} \\ &= \left\{ \begin{pmatrix} y\\0 \end{pmatrix} : x, y \in \mathbb{R}\right\}\end{align*}\]

Another way to write this is that \(\operatorname{im}T_A = \operatorname{span} \begin{pmatrix} 1\\0 \end{pmatrix}\), and so \(\dim \operatorname{im} T_A = 1\).

Now we’ll do the kernel.

\[\begin{align*} \ker T_A &= \left\{ \begin{pmatrix} x\\y \end{pmatrix} \in \mathbb{R}^2 : T_A \begin{pmatrix} x\\y \end{pmatrix} = \begin{pmatrix} 0\\0 \end{pmatrix} \right\} \\ &= \left\{ \begin{pmatrix} x\\y \end{pmatrix} \in \mathbb{R}^2: \begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} x\\y \end{pmatrix} = \begin{pmatrix} 0\\0 \end{pmatrix} \right\} \\ &= \left\{ \begin{pmatrix} x\\y \end{pmatrix} \in \mathbb{R}^2 : \begin{pmatrix} y\\0 \end{pmatrix} = \begin{pmatrix} 0\\0 \end{pmatrix} \right\} \\ &= \left\{ \begin{pmatrix} x \\ 0 \end{pmatrix} : x \in \mathbb{R}\right\} \end{align*}\]

Again we could write this as \(\ker T_A = \operatorname{span} \begin{pmatrix} 1\\0 \end{pmatrix}\). The kernel and image are equal in this case.

Let \(D: \mathbb{R}_{\leq n} [x] \to \mathbb{R}_{\leq n}[x]\) be \(D(f) = \frac{df}{dx}\). Describe \(\ker D\) and \(\operatorname{im} D\).

A polynomial has derivative zero if and only if it is constant, so \(\ker D\) is the set of all constant polynomials, which we could write as \(\operatorname{span}(1)\) where 1 denotes the constant polynomial with value 1.

What about the image? Let \(S \leq \mathbb{R}_{\leq n}[x]\) be the subspace spanned by \(1, x, \ldots, x^{n-1}\), that is, the subspace consisting of all polynomials of degree at most \(n-1\). Certainly \(\operatorname{im} D \leq S\) - when you differentiate a polynomial of degree at most n, you get a polynomial of degree at most n-1. But if \(s(x) \in S\) then \(s(x)\) has an indefinite integral \(t(x)\) in \(\mathbb{R}_{\leq n}[x]\) and \(D(t) = s\), so every \(s \in S\) is in \(\operatorname{im}D\), so \(\operatorname{im}D = S\).

Let \(T: V \to W\) be a linear map.

- the
**nullity**of \(T\), written \(\operatorname{null}T\), is \(\dim \ker T\) - the
**rank**of \(T\), written \(\operatorname{rank}T\) is \(\dim \operatorname{im} T\).

Returning to the differentiation example from the end of the last lecture, \(D : \mathbb{R}_{\leq n}[x] \to \mathbb{R}_{\leq n}[x]\) has nullity 1 (since its kernel was one-dimensional, spanned by the constant polynomial 1) and rank \(n\), since its image had a basis \(1, x, \ldots, x^{n-1}\) of size \(n\). Notice that \(\operatorname{rank}(D) + \operatorname{null}(D) = \dim \mathbb{R}_{\leq n}[x]\), this isn’t a coincidence.

**Theorem:** (the *rank-nullity theorem*) for any linear map \(T:V \to W\),

\[\operatorname{rank} T + \operatorname{null} T = \dim V.\]

*Proof*. We’ll assume \(V\) and \(W\) are finite-dimensional, not that it matters. Here is an outline of how the proof is going to work.

- choose a basis \(\mathcal{K} = \mathbf{k}_1,\ldots, \mathbf{k}_m\) of \(\ker T\)
- extend it to a basis \(\mathcal{B} = \mathbf{k}_1,\ldots, \mathbf{k}_m, \mathbf{v}_1, \ldots, \mathbf{v}_n\) of \(V\) (so \(\dim V = m+n\) and we need only show \(\dim \operatorname{im} T = n\))
- show that \(T(v_1), \ldots, T(v_n)\) is a basis of \(\operatorname{im} T\).

The only part needing elaboration is the last part. First, I claim that \(T(v_1), \ldots, T(v_n)\) span \(\operatorname{im} T\). Any element of the image is \(T(\mathbf{v})\), some \(\mathbf{v} \in V\). We have to show any such \(T(\mathbf{v})\) lies in the span of the \(T(\mathbf{v}_i)\)s.

Since \(\mathcal{B}\) is a basis of \(V\) we may write \(\mathbf{v}\) as \(\sum_{i=1}^m \lambda_i \mathbf{k}_i + \sum_{i=1}^n \mu_i \mathbf{v}_i\) for some scalars \(\lambda_i, \mu_i\). Then

\[\begin{align*} T(\mathbf{v}) &= T\left( \sum_{i=1}^m \lambda_i \mathbf{k}_i + \sum_{i=1}^n \mu_i \mathbf{v}_i \right) \\ &= \sum_{i=1}^m \lambda_i T(\mathbf{k}_i) + \sum_{i=1}^n \mu_i T(\mathbf{v}_i) & \text{linearity} \\ &= \sum_{i=1}^n \mu_i T(\mathbf{v}_i) & \text{as } \mathbf{k}_i \in \ker T \\ & \in \operatorname{span}(T(\mathbf{v}_1), \ldots, T(\mathbf{v}_n)) \end{align*}\]

as required.

Now I claim \(T(\mathbf{v}_1), \ldots, T(\mathbf{v}_n)\) is linearly independent. Suppose

\[\sum_{i=1}^n \mu _i T(\mathbf{v}_i) = 0,\]

so that we need to show the \(\mu_i\) are all 0.

Using linearity,

\[T\left( \sum_{i=1}^n \mu_i \mathbf{v}_i \right) = 0\]

which means \(\sum_{i=1}^n \mu_i \mathbf{v}_i \in \ker T\). As \(\mathcal{K}\) is a basis for \(\ker T\), we can write

\[\sum_{i=1}^n \mu_i \mathbf{v}_i = \sum_{i=1}^m\lambda_i \mathbf{k}_i\]

for some scalars \(\lambda_i\). But this is a linear dependence relation on the **basis** \(\mathcal{B}\), so all the scalars are 0, in particular all the \(\mu_i\) are 0. This completes the proof.

Now we know the rank-nullity theorem, finding bases of kernels and images gets slightly easier: as soon as you know the dimension of the kernel or the image, you can work out the dimension of the other by rearranging the rank-nullity theorem to say

\[\begin{align*} \dim \ker T &= \dim V - \dim \operatorname{im} T, &\text{or} \\ \dim \operatorname{im} T &= \dim V - \dim \ker T. \end{align*}\]

Let \(A\) be an \(m \times n\) matrix. Recall \(T_A:\mathbb{R}^n \to \mathbb{R}^m\) is \(T_A( \mathbf{x}) = A \mathbf{x}\).

Then \(\ker T_A\) is the set of solutions to the matrix equation \(A \mathbf{x}= \mathbf{0}\)

For example, let \(A = \begin{pmatrix} 1&2&3\\4&5&6\\7&8&9\end{pmatrix}\) Then if we do row operations to put this into RREF, we get \(\begin{pmatrix}1&0&-1\\0&1&2\\0&0&0\end{pmatrix}\) The solutions to \(A \mathbf{x} = 0\) are \(\begin{pmatrix}z\\-2z\\z\end{pmatrix} = z \begin{pmatrix} 1\\-2\\1\end{pmatrix}\) So elements of \(\ker T_A\) are exactly the scalar multiples of \(\begin{pmatrix}1\\-2\\1\end{pmatrix}\), and \(\ker T_A = \operatorname{span}\begin{pmatrix}1\\-2\\1\end{pmatrix}\) has dimension 1.

(In general, \(\dim \ker T_A\) is the number of columns with no leading entry in the RREF matrix you can get by doing row operations to \(A\))

We now know that \(\dim \operatorname{im} T_A = \dim \mathbb{R}^3 - \dim \ker T_A =3-1=2\). Can we find a basis?

Let’s start with a general result:

**Proposition:** \(\operatorname{im} T_A\) equals the column space of \(A\), that is, \(\operatorname{span}(\mathbf{c}_1,\ldots, \mathbf{c}_n)\).

Here \(\mathbf{c}_j\) denotes the \(j\)th column of \(A\). Notice that if \(\mathbf{e}_j\) is the column vector with a 1 in position \(j\) and 0 elsewhere, we have \(A \mathbf{e}_j = \mathbf{c}_j\). An easy way to see this is to note that the \(\mathbf{e}_i\) are the columns of the identity matrix \(I\). We then compute \(AI\) in two different ways. Firstly, \(I = ( \mathbf{e}_1 \; \cdots \; \mathbf{e}_n )\) and so, because matrix multiplication works columnwise, \(AI = ( A \mathbf{e}_1\; \cdot \; A \mathbf{e}_n )\). Secondly, \(AI=A\) so \(AI = ( \mathbf{c}_1 \; \cdots \mathbf{c}_n )\). Comparing these two descriptions of \(AI\) we get \(A \mathbf{e}_i = \mathbf{c}_i\).

*Proof:* Let \(\mathbf{e}_1,\ldots, \mathbf{e}_n\) be the standard basis. Given \(\mathbf{v}= \begin{pmatrix} v_1\\ \vdots \\ v_n \end{pmatrix} \in \mathbb{R}^n\) we can write \(\mathbf{v} = \sum_{j=1}^n v_j \mathbf{e}_j\)

so \(T_A(\mathbf{v}) = A\sum_{j=1}^n v_j \mathbf{e}_j = \sum_{j=1}^n v_j A\mathbf{e}_j = \sum_{j=1}^n v_j \mathbf{c}_j\).

This shows that every element of \(\operatorname{im}T_A\) is in the span of the columns of \(A\), and conversely every element of the column space is in the image of \(T_A\), finishing the proof.

In our example, \(\mathbf{c}_1 = \begin{pmatrix} 1\\4\\7 \end{pmatrix}\), \(\mathbf{c}_2 = \begin{pmatrix} 2\\5\\8 \end{pmatrix}\), \(\mathbf{c}_3 = \begin{pmatrix} 3\\6\\9 \end{pmatrix}\). We know that the rank is 2, so any two linearly independent vectors in the image are a basis. Noting that \(\mathbf{c}_1\) and \(\mathbf{c}_2\) are linearly independent (check this!), they must form a basis of the image of \(T_A\).

Linear transformations are abstractly defined things. We’d like to make them concrete. We do this by making the following observation: once you know what a LT does on a basis, you know what it does everywhere.

Here’s what that means exactly. Let \(T:V \to W\) be linear. Let \(\mathbf{v}_1,\ldots, \mathbf{v}_n\) be a basis of V. Then we can write any \(\mathbf{v}\) as \(\sum_{i=1}^n \lambda_i \mathbf{v}_i\) for some scalars \(\lambda_i\), and so by linearity \(T(\mathbf{v}) = T(\sum_{i=1}^n \lambda_i \mathbf{v}_i) = \sum_{i=1}^n \lambda_i T(\mathbf{v}_i)\)

So if I want to communicate a linear map, I can just say what it does on a basis \(\mathbf{v}_1, \ldots, \mathbf{v}_n\). You can then work out \(T(\mathbf{v})\) for any \(\mathbf{v} \in V\) just by knowing the \(T(\mathbf{v}_i)\).

We can record “what \(T\) does” to each \(\mathbf{v}_i\) by giving the coefficients needed to write the \(T(\mathbf{v}_i)\) in terms of some fixed basis of \(W\).

**Definition:** let \(T:V \to W\) be linear, and let

- \(\mathcal{B} = \mathbf{v}_1,\ldots,\mathbf{v}_n\) be a basis of \(V\)
- \(\mathcal{C} = \mathbf{w}_1,\ldots, \mathbf{w}_m\) be a basis of \(W\).

Define scalars \(a_{ij}\) by \(T(\mathbf{v}_j) = \sum_{i=1}^m a_{ij} \mathbf{w}_i\). Then **the matrix of T with respect to initial basis \(\mathcal{B}\) and final basis \(\mathcal{C}\)**, written \([T]_{\mathcal{C}}^{\mathcal{B}}\), is the \(m \times n\) matrix \((a_{ij})\).

we know we can do this because the \(\mathbf{w}_i\) are a basis of W, and we know the \(a_{ij}\) are uniquely determined.

Another way to think of this definition is that the \(j\)th column of \([T]_{\mathcal{C}}^{\mathcal{B}}\) records the image of the \(j\)th basis element from \(\mathcal{B}\) under \(T\), in the sense that the entries are the coefficients used in expressing \(T(\mathbf{v}_j)\) as a linear combination of \(\mathcal{C}\).

When we have a linear map from a vector space to itself, we sometimes use a slightly different terminology. If \(T:V \to V\) and \(\mathcal{B}\) is a basis of V, the **matrix of T with respect to \(\mathcal{B}\)** means \([T]_\mathcal{B}^\mathcal{B}\).

Notice that the order of the basis matters in this definition. If you order the basis elements differently, you change the order of the columns/rows in the matrix. That’s why our bases are sequences, not sets.

Let \(T: \mathbb{R}^2 \to \mathbb{R}^3\) be defined by \(T\begin{pmatrix}a\\ b\end{pmatrix} = \begin{pmatrix}a+b\\ a-b\\ 2a+b\end{pmatrix}\). This is linear.

Let’s find the matrix of \(T\) with respect to the two standard bases:

- \(\mathcal{E} = \mathbf{e}_1,\mathbf{e}_2\) for \(\mathbb{R}^2\)
- \(\mathcal{E}' = \mathbf{e}_1', \mathbf{e}_2', \mathbf{e}_3'\) for \(\mathbb{R}^3\)

We have

\[T(\mathbf{e}_1) = \begin{pmatrix}1\\1\\2\end{pmatrix} = 1\mathbf{e}_1' + 1\mathbf{e}_2' + 2\mathbf{e}_3'\]

\[T(\mathbf{e}_2) = \begin{pmatrix}1\\ -1\\ 1\end{pmatrix} = 1\mathbf{e}_1' - 1\mathbf{e}_2' + 1\mathbf{e}_3\]

so the matrix \([T]_{\mathcal{E}'}^{\mathcal{E}}\) is \(\begin{pmatrix} 1&1 \\ 1&-1 \\ 2 & 1\end{pmatrix}\).

Here’s another example. Let

- \(V\) be the vector space of all polynomials with real coefficients of degree \(\leq 3\) in one variable \(x\)
- \(D:V \to V\) be the differentiation map
- \(\mathcal{B}\) be the basis \(1, x, x^2, x^3\) of \(V\).

\(D\) is a linear map, so let’s find the matrix \([D]_{\mathcal{B}}^\mathcal{B}\) of \(D\) with respect to \(\mathcal{B}\). We have

\[\begin{align*} D(1) &= 0 = 0\times 1 + 0\times x + 0\times x^2 + 0\times x^3 \\ D(x) &= 1 = 1\times 1 + 0\times x + 0\times x^2 + 0\times x^3\\ D(x^2) &= 2x= 0\times 1 + 2\times x + 0\times x^2 + 0\times x^3 \\ D(x^3) &= 3x^2= 0\times 1 + 0\times x + 3\times x^2 + 0\times x^3 \end{align*}\]

and so

\[[D]_ \mathcal{B}^ \mathcal{B} = \begin{pmatrix} 0&1&0&0\\ 0&0&2&0\\ 0&0&0&3\\ 0&0&0&0 \end{pmatrix}\]

Let \(\operatorname{id} : V \to V\) be the identity map \(\operatorname{id}(\mathbf{v}) = \mathbf{v}\). Let \(\mathcal{B} = \mathbf{b}_1, \ldots, \mathbf{b}_n\) be any basis for \(V\). We’re going to work out \([ \operatorname{id} ]_ \mathcal{B} ^ \mathcal{B}\). For any \(j\),

\[ \operatorname{id}(\mathbf{b}_j) = \mathbf{b}_j = 0\times \mathbf{b}_1 + \cdots + 1\times \mathbf{b}_j + \cdots + 0 \times \mathbf{b}_n.\]

This means the \(j\)th column of \([ \operatorname{id} ]^ \mathcal{B} _ \mathcal{B}\) is all 0s, except a 1 in position \(j\). In other words, \([ \operatorname{id} ]^ \mathcal{B} _ \mathcal{B} = I_n\), the \(n\times n\) identity matrix.

This shows that the matrix of the identity map is the identity matrix, *so long as the initial basis and the final basis are the same*.

On the other hand, if \(\mathcal{C}=\mathbf{c}_1,\ldots, \mathbf{c}_n\) is a different basis of \(V\) then \([ \operatorname{id} ]^ \mathcal{C} _ \mathcal{B}\) will **not** be the identity matrix. To figure out what goes in the \(j\)th column of this matrix we have to work out \(\operatorname{id}(\mathbf{b}_j)\), which is just \(\mathbf{b}_j\) of course, as a linear combination of the \(\mathbf{c}_i\)s - the coefficients we have to use, whatever they are, make up this column of the matrix.

As a concrete example, consider two bases for \(\mathbb{R}^2\)

\[\begin{align*} \mathcal{B}: & \;\; \mathbf{e}_1 = \begin{pmatrix} 1\\0 \end{pmatrix}, \mathbf{e}_2 = \begin{pmatrix} 0\\1 \end{pmatrix} \\ \mathcal{C}: & \;\; \mathbf{c}_1 = \begin{pmatrix} 1\\1 \end{pmatrix}, \mathbf{c}_2 = \begin{pmatrix} 1\\ -1 \end{pmatrix} \end{align*}\]

Both \([ \operatorname{id} ] ^ \mathcal{B} _ \mathcal{B}\) and \([ \operatorname{id} ] ^ \mathcal{C} _ \mathcal{C}\) will be the identity matrix \(I_2\). Let’s work out \([ \operatorname{id} ]^ \mathcal{C} _ \mathcal{B}\). To do that, we have to express \(\operatorname{id}(c_j)\) as a linear combination of the \(\mathbf{e}_i\) for \(j=1,2\):

\[\begin{align*} \operatorname{id}(\mathbf{c}_1) &= \mathbf{c}_1 = \mathbf{e}_1 + \mathbf{e}_2 \\ \operatorname{id}(\mathbf{c}_2) &= \mathbf{c}_2 = \mathbf{e}_1 - \mathbf{e}_2 \end{align*}\]

and so \([ \operatorname{id} ]^ \mathcal{C} _ \mathcal{B} = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}\).

Let \(A\) be an m by n matrix, and \(T_A : \mathbb{R}^n \to \mathbb{R}^m\) be the linear map \(T_A(\mathbf{x}) = A\mathbf{x}\). Then the matrix of \(T_A\) with respect to the standard bases of \(\mathbb{R}^n\) and \(\mathbb{R}^m\) is \(A\).

Suppose we have two composable linear maps, \(S\) and \(T\). The composition \(T\circ S\) is still linear, as you can check. How does the matrix for \(T \circ S\) relate to the matrices for \(T\) and \(S\)?

**Theorem:** let \(S:U \to V, T:V \to W\) be linear maps. Let

- \(\mathcal{B}= \mathbf{b}_1,\ldots, \mathbf{b}_l\) be a basis of \(U\)
- \(\mathcal{C}= \mathbf{c}_1, \ldots, \mathbf{c}_m\) be a basis of \(V\)
- \(\mathcal{D} = \mathbf{d}_1, \ldots, \mathbf{d}_n\) be a basis of \(W\)

Then \([T\circ S]_\mathcal{D}^\mathcal{B} = [T]^\mathcal{C}_\mathcal{D}[S]_\mathcal{C}^\mathcal{B}\)

Here is picture of this situation:

\[\begin{array}{lllll} \mathcal{B} & & \mathcal{C} & & \mathcal{D} \\ U & \stackrel S \to & V & \stackrel T \to & W \end{array}\]

This theorem provides some justification for our definition of matrix multiplication: composition of linear maps corresponds to multiplication of matrices.

*Proof*: Let \([T]_\mathcal{D}^\mathcal{C} = (t_{ij})\) and \([S]_\mathcal{C}^\mathcal{B} = (s_{ij})\).

We will work out \([T\circ S]_\mathcal{D}^ \mathcal{B}\) using the definition of the matrix of a linear map. For any \(j\),

\[\begin{align*} (T\circ S)(\mathbf{b}_j) &= T(S(\mathbf{b}_j)) \\ &= T\left(\sum_{k=1}^m s_{kj} \mathbf{c}_k\right) & \text{as } [S]_\mathcal{C}^\mathcal{B} = (s_{ij}) \\ &= \sum_{k=1}^m s_{kj} T(\mathbf{c}_k) & \text{linearity of } T \\ &= \sum_{k=1}^m s_{kj} \sum_{i=1}^n t_{ik} \mathbf{d}_i & \text{as } [T]_\mathcal{D}^\mathcal{C} = (t_{ij}) \\ &= \sum_{i=1}^n \left(\sum_{k=1}^m t_{ik} s_{kj} \right) \mathbf{d}_i & \text{for finite sums, } \sum_k\sum_i = \sum_i \sum_k \end{align*}\]

so the \(i,j\) entry of \([T\circ S]_\mathcal{D} ^\mathcal{B}\) is \(\sum_{k=1}^m t_{ik} s_{kj}\), which is the \(i,j\) entry of \([T]_\mathcal{D}^\mathcal{C} [S]_\mathcal{C}^\mathcal{B}\).

Suppose we have a linear map \(T\) from \(V\) to \(W\) and two different bases for \(V\) and two different bases for \(W\). We can form the matrix of \(T\) with respect to the first initial and final bases, and the second. These record the same information (the linear map \(T\)) in different ways, so they should be related in some way.

Notation: let

- \(T: V \to W\) be a linear map
- \(\mathcal{B}, \mathcal{B}'\) be bases of \(V\), and
- \(\mathcal{C}, \mathcal{C}'\) be bases of \(W\).

Now make the following observation:

\[T = \operatorname{id}_W \circ T \circ \operatorname{id}_V\]

which holds purely because composing with an identity map doesn’t change anything.

Now apply the theorem from the previous video twice: you get the **change of basis formula**:

\[[T]^{\mathcal{B}'}_{\mathcal{C}'} = [\operatorname{id}]^\mathcal{C}_{\mathcal{C}'} [T]^\mathcal{B}_\mathcal{C} [\operatorname{id}]^{\mathcal{B}'}_{\mathcal{B}}\]

In this subsection we’re going to work an example of computing matrices of linear maps using the change of basis formula. On the way we’ll see the significance of the matrix of the identity map with respect to different bases.

Let \(T: \mathbb{R}^2 \to \mathbb{R}^2\) be the linear map

\[T \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} -16x+6y \\ -45x+17y \end{pmatrix}\]

Let \(\mathcal{E}\) be the standard basis \(\mathbf{e}_1, \mathbf{e}_2\) of \(\mathbb{R}^2\) and let \(\mathcal{F}\) be the basis \(\mathbf{f}_1 = \begin{pmatrix} 2 \\ 5\end{pmatrix}, \mathbf{f}_2 = \begin{pmatrix} 1 \\ 3 \end{pmatrix}\).

The matrix of \(T\) with respect to \(\mathcal{E}\) is easy to find:

\[\begin{align*} T( \mathbf{e}_1) &= T \begin{pmatrix}1\\0\end{pmatrix} \\ &= \begin{pmatrix} -16 \\ -45 \end{pmatrix} \\ &= -16 \mathbf{e}_1 -45 \mathbf{e}_2 \\ T( \mathbf{e}_2) &= T \begin{pmatrix}0\\1\end{pmatrix} \\ &= \begin{pmatrix} 6 \\ 17 \end{pmatrix} \\ &= 6 \mathbf{e}_1 +17 \mathbf{e}_2 \end{align*}\]

so \([T]^{\mathcal{E}}_{\mathcal{E}} = \begin{pmatrix} -16 & 6 \\ -45 & 17 \end{pmatrix}\).

\([ \operatorname{id}]^{\mathcal{F}}_{\mathcal{E}}\) is also easy: it’s the matrix which tells us how to express the elements of \(\mathcal{F}\) in terms of the standard basis. \[\begin{align*} \operatorname{id}(\mathbf{f}_1) &= \mathbf{f}_1 = 2 \mathbf{e}_1 + 5 \mathbf{e}_2 \\ \operatorname{id}(\mathbf{f}_2) &= \mathbf{f}_2 = 1 \mathbf{e}_1 + 3 \mathbf{e}_2 \end{align*}\] and so \([ \operatorname{id}]^{\mathcal{F}}_{\mathcal{E}} = \begin{pmatrix} 2&1\\5&3\end{pmatrix}\).

How to express the \(\mathbf{e}_i\) in terms of the \(\mathbf{f}_i\) isn’t so obvious, so on the face of it computing \([ \operatorname{id}]^{\mathcal{E}}_{\mathcal{F}}\) is harder. But we can avoid that by using the Theorem from the previous video *Matrix of a composition*, which tells us that

\[\begin{align*}[ \operatorname{id}]^{\mathcal{E}}_{\mathcal{F}} [\operatorname{id}]^{\mathcal{F}}_{\mathcal{E}} & = [ \operatorname{id}\circ \operatorname{id}]^{\mathcal{F}}_{\mathcal{F}} &\text{by the theorem of section 62} \\ &= [ \operatorname{id} ] ^{\mathcal{F}}_{\mathcal{F}} & \text{as } \operatorname{id}\circ \operatorname{id} = \operatorname{id} \\ &= I_2\end{align*}\]

so

\[\begin{align*} [ \operatorname{id}]_{\mathcal{F}}^{\mathcal{E}} &= ([ \operatorname{id}]^{\mathcal{F}}_{\mathcal{E}}) ^{-1} \\ &= \begin{pmatrix} 2&1\\5&3\end{pmatrix}^{-1} \\ &= \begin{pmatrix} 3&-1\\-5&2\end{pmatrix} \end{align*}\]

We could work out \([T]^{\mathcal{F}}_{\mathcal{F}}\) directly using the definition, but instead we are going to practise using the change of basis formula. It says

\[\begin{align*} [T]^{\mathcal{F}}_{\mathcal{F}} &= [ \operatorname{id}]^{\mathcal{E}}_{\mathcal{F}} [T]^{\mathcal{E}}_{\mathcal{E}} [ \operatorname{id}]^{\mathcal{F}}_{\mathcal{E}} \\ &= \begin{pmatrix} 3&-1 \\ -5 & 2 \end{pmatrix} \begin{pmatrix} -16 & 6 \\ -45 & 17 \end{pmatrix} \begin{pmatrix} 2&1 \\5&3\end{pmatrix} \\ &= \begin{pmatrix} -1 & 0 \\ 0 & 2\end{pmatrix} \end{align*}\]

What about \([T]^{\mathcal{E}}_{\mathcal{F}}\)? Again we could do this directly from the definition by computing \(T(\mathbf{e}_1)\) and \(T(\mathbf{e}_2)\) and expressing them in terms of the \(\mathbf{f}_i\)s. But we already have the information we need: by the theorem on the matrix of a composition, from the last video:

\[\begin{align*} [T]^{\mathcal{E}}_{\mathcal{F}} &= [T\circ \operatorname{id}]^{\mathcal{E}}_{\mathcal{F}} \\ &= [T]^{\mathcal{F}}_{\mathcal{F}} [ \operatorname{id}]^{\mathcal{E}}_{\mathcal{F}} \\ &= \begin{pmatrix} -1&0\\0&2\end{pmatrix} \begin{pmatrix} 3&-1\\-5&2\end{pmatrix} \\ &= \begin{pmatrix} -3&1 \\ -10 & 4 \end{pmatrix} \end{align*}\]

To check our answer we compute \(T(\mathbf{e}_1)\), which is \(\begin{pmatrix} -16 \\ -45 \end{pmatrix}\). If the matrix is correct this should be the same as \(-3 \mathbf{f}_1 - 10 \mathbf{f}_2\), and you can check that it really is.

We already saw, when we first met bases of vector spaces, that different bases of vector spaces give us a different perspective on their elements - recall the example about two-pixel images. The same idea applies to linear maps.

A linear transformation which looks complex with respect to one basis can become much easier to understand when you choose the correct basis.

**Example:** \(A = \begin{pmatrix} 4&-3&-3\\3&-2&-3\\-1&1&2 \end{pmatrix}\). Consider \(T_A : \mathbb{R}^3 \to \mathbb{R}^3\), so the matrix of \(T_A\) with respect to the standard basis of \(\mathbb{R}^3\) is \(A\). There is no obvious structure to \(T_A\).

Now consider a new basis \(\mathcal{B}\) of \(\mathbb{R}^3\): \(\mathbf{b}_1 = \begin{pmatrix} 1\\1\\0 \end{pmatrix}, \mathbf{b}_2 = \begin{pmatrix} 1\\0\\1 \end{pmatrix}, \mathbf{b}_3 = \begin{pmatrix} -3\\-3\\1 \end{pmatrix}\). (You should check it really is a basis.) Let’s find the matrix of \(T_A\) with respect to \(\mathcal{B}\), that is, \([T_A]^ \mathcal{B} _ \mathcal{B}\).

You can check that \(T_A(\mathbf{b}_1) = \mathbf{b}_1, T_A(\mathbf{b}_2)=\mathbf{b}_2, T_A(\mathbf{b}_3) = 2\mathbf{b}_3\). Thus \([T_A]^ \mathcal{B} _ \mathcal{B} = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&2 \end{pmatrix}\)

Suddenly the behaviour of \(T_A\) is clear. Vectors in the direction \(\begin{pmatrix} 1\\1\\0 \end{pmatrix}\) and \(\begin{pmatrix} 1\\0\\1 \end{pmatrix}\) are unchanged by \(T_A\), vectors in the direction \(\mathbf{b}_3\) are scaled by a factor of 2.

This technique is called **diagonalisation**, you will learn more about it when you study eigenvectors and eigenvectors in Algebra 2.

Consider the real vector space of all height 3 column vectors, and the real vector space of all height 3 row vectors. These are different vector spaces but they are essentially the same: there’s no vector space property which one has but the other doesn’t.

A way to make this precise is to observe that there is a bijective linear map - the *transpose*, sending \(\begin{pmatrix} x\\y\\z \end{pmatrix}\) to \((x\; y\; z)\) - between them. Linear maps which are bijections are called **vector space isomorphisms**, or just isomorphisms. If there is an isomorphism \(U \to V\), we say that \(U\) and \(V\) are **isomorphic** and write \(U \cong V\).

Notice that the existence of a linear map between two vector spaces requires that they have the same field of scalars.

**Lemma:** a linear map \(T: U \to V\) is one-to-one if and only if \(\ker T = \{\mathbf{0}_U\}\).

*Proof*: Suppose \(T\) is one-to-one. We know \(T(\mathbf{0}_U)=\mathbf{0}_V\). Suppose \(\mathbf{x} \in \ker T\). Then \(T(\mathbf{x})=\mathbf{0}_V=T(\mathbf{0}_U)\), so \(\mathbf{x}=\mathbf{0}_U\). Thus \(\ker T = \{\mathbf{0}_U\}\).

Suppose \(\ker T = \{\mathbf{0}_U\}\) and that \(T(\mathbf{x})=T(\mathbf{y})\). Then \(T(\mathbf{x}-\mathbf{y})=\mathbf{0}_V\) by linearity, so \(\mathbf{x}-\mathbf{y} \in \ker T\), so \(\mathbf{x}-\mathbf{y}=0_U\), so \(\mathbf{x}=\mathbf{y}\) and \(T\) is one-to-one.

**Lemma:** if \(U \cong V\) then \(\dim U = \dim V\)

*Proof*: there’s a linear bijection \(T:U \to V\). \(\ker T = \{\mathbf{0}_U\}\), so applying the rank-nullity theorem \(\dim U= \dim \ker T + \dim \operatorname{im} T = \dim \operatorname{im} T\). Since \(\ker T = \{\mathbf{0}_U\}\) we get \(\dim U = \dim \operatorname{im} T\), but \(T\) is onto so \(\operatorname{im} T = V\) and \(\dim U = \dim V\).

Perhaps more surprisingly, the converse is true. If two vector spaces over the same field have the same dimension, they’re isomorphic. The first step to proving this is:

**Proposition**: let \(V\) be an \(\mathbb{F}\)-vector space with dimension \(n\). Then \(V \cong \mathbb{F}^n\).

*Proof*. Let \(\mathbf{b}_1,\ldots, \mathbf{b}_n\) be a basis of \(V\). Every \(\mathbf{v} \in V\) can be written *uniquely* as a linear combination of the \(\mathbf{b}_i\). So we can write

\[\mathbf{v}= \sum_{i=1}^n f_i(\mathbf{v}) \mathbf{b}_i\]

with \(f_i(\mathbf{v}) \in \mathbb{F}\). We then define \(T : V \to \mathbb{F}^n\) by \(T(\mathbf{v}) = \begin{pmatrix}f_1(\mathbf{v})\\ \ldots\\ f_n(\mathbf{v})\end{pmatrix}\).

To show that this is an isomorphism we have to show that it is linear, one-to-one, and onto.

Linearity: If \(\mathbf{v}, \mathbf{w} \in V\) then \(\mathbf{v} = \sum_{i=1}^n f_i(\mathbf{v})\mathbf{b}_i\) and \(\mathbf{w} = \sum_{i=1}^n f_i(\mathbf{w}) \mathbf{b}_i\). Then \(\mathbf{v} + \mathbf{w} = \sum_{i=1}^n (f_i(\mathbf{v}) + f_i(\mathbf{w}) ) \mathbf{b}_i\). But also \(\mathbf{v} + \mathbf{w} = \sum_{i=1}^n f_i(\mathbf{v}+\mathbf{w})\mathbf{b}_i\). Because the expression for \(\mathbf{v}+\mathbf{w}\) as a linear combination of the \(\mathbf{b}_i\) is unique, the scalars must be the same both times: \(f_i(\mathbf{v}+\mathbf{w}) = f_i(\mathbf{v}) + f_i(\mathbf{w})\) for each \(i\). Now

\[\begin{align*} T(\mathbf{v}+\mathbf{w}) &= (f_1(\mathbf{v}+\mathbf{w}), \ldots, f_n(\mathbf{v}+\mathbf{w})) \\ &= (f_1(\mathbf{v}) + f_1(\mathbf{w}), \ldots, f_n(\mathbf{v}) + f_n(\mathbf{w})) \\ &= (f_1(\mathbf{v}), \ldots, f_n(\mathbf{v})) + (f_1(\mathbf{w}), \ldots, f_n(\mathbf{w})) \\ &= T(\mathbf{v}) + T(\mathbf{w}) \end{align*}\]

If \(\lambda \in \mathbb{F}\) then we can play the same game: \(\mathbf{v} = \sum_{i=1}^n f_i(\mathbf{v}) \mathbf{b}_i\), so \(\lambda \mathbf{v} = \lambda \sum_{i=1}^n f_i(\mathbf{v})\mathbf{b}_i = \sum_{i=1}^n \lambda f_i(\mathbf{v}) \mathbf{b}_i\). But \(\lambda \mathbf{v} = \sum_{i=1}^n f_i(\lambda \mathbf{v}) \mathbf{b}_i\). Comparing coefficients, \(\lambda f_i(\mathbf{v}) = f_i(\lambda \mathbf{v})\). It follows just as above that \(T(\lambda \mathbf{v}) = \lambda T(\mathbf{v})\), so \(T\) is linear.- One-to-one: If \(T(\mathbf{v}) = \begin{pmatrix}0\\ \ldots\\ 0\end{pmatrix}\) then \(\mathbf{v} = \sum_{i=1}^n 0\times \mathbf{b}_i = \mathbf{0}_V\).
Onto: By rank-nullity, \(\dim \operatorname{im} T = n\). So \(\operatorname{im} T\) is a subspace of \(\mathbb{F}^n\) of dimension \(n\), therefore is all of \(\mathbb{F}^n\).

Now given any two \(\mathbb{F}\)-vector spaces \(U\) and \(V\) with dimension \(n\) we have isomorphisms \(f\) and \(g\):

\[f : U \to \mathbb{F}^n \rightarrow V : g\]

We’d like to get a linear bijection from \(U\) to \(V\), and from the above, it seems like the right thing is \(g^{-1} \circ f\). We know that the inverse of a bijection is a bijection, so \(g^{-1}\) is a bijection, and we know that the composition of two bijections is a bijection, so \(g^{-1} \circ f\) is a bijection. But is it linear?

**Lemma.** Let \(X\) and \(Y\) be \(\mathbb{F}\)-vector spaces and \(T : X \to Y\) be a linear bijection. Then \(T^{-1} : Y \to X\) is linear.

*Proof.* We check the two parts of the definition of linearity.

- Let \(\mathbf{y} \in Y\) and \(\lambda \in \mathbb{F}\). Since \(T\) is onto, \(\mathbf{y} = T(\mathbf{x})\) for some \(\mathbf{x} \in X\), and \(T^{-1}(\mathbf{y})=\mathbf{x}\). Now multiplying \(\mathbf{y}=T(\mathbf{x})\) by \(\lambda\) give \(\lambda \mathbf{y} = \lambda T(\mathbf{x}) = T(\lambda \mathbf{x})\) since \(T\) is linear, and so \(T^{-1} (\lambda \mathbf{y}) = \lambda \mathbf{x} = \lambda T^{-1}(\mathbf{y})\).
- Let \(\mathbf{y}_1, \mathbf{y}_2 \in Y\). Again, as \(T\) is onto, there exist \(\mathbf{x}_1, \mathbf{x}_2 \in X\) such that \(T(\mathbf{x}_i) = \mathbf{y}_i\), and consequently \(T^{-1}(\mathbf{y}_i) = \mathbf{x}_i\). Now \(\mathbf{y}_1 + \mathbf{y}_2 = T(\mathbf{x}_1) + T(\mathbf{x}_2) = T(\mathbf{x}_1+\mathbf{x}_2)\) as \(T\) is linear, so \(T^{-1}(\mathbf{y}_1+\mathbf{y}_2) = \mathbf{x}_1+\mathbf{x}_2 = T^{-1}(\mathbf{y}_1) + T^{-1}(\mathbf{y}_2)\).

**Theorem.** Any two vector spaces of the same finite dimension over the same field \(\mathbb{F}\) are isomorphic.

*Proof.* We have already seen that if \(\dim U = \dim V = n\) then there are isomorphisms \(f : U \to \mathbb{F}^n\) and \(g : V \to \mathbb{F}^n\). The map \(g^{-1}\circ f\) is a bijection because it is the composition of two bijections, and is linear by the previous lemma. Therefore it is a vector space isomorphism, and \(U \cong V\).