3.3 Linear maps

3.3.1 Definitions

Definition 3.9 Let U and V be \(\FF\)-vector spaces. Then \(f:U \to V\) is called a linear function or a linear map if

  • \(f(\mathbf{x}+\mathbf{y})=f(\mathbf{x})+f(\mathbf{y})\) for all \(\mathbf{x},\mathbf{y} \in U\), and
  • \(f(\lambda \mathbf{x})= \lambda f(\mathbf{x})\) for all \(\mathbf{x} \in U\) and all \(\lambda\in \FF\).

By using these properties repeatedly, \[ f(\lambda_1 \mathbf{x}_1 + \cdots + \lambda_n \mathbf{x}_n) = \lambda_1 f(\mathbf{x}_1) + \cdots + \lambda_n f(\mathbf{x}_n) \] for any linear map f, number n, scalars \(\lambda_i\) and vectors \(\mathbf{x}_i \in U\).

Example 3.10

  • Let U and V be \(\FF\)-vector spaces. The identity map \(\id_U :U \to U\) defined by \(\id_U(\mathbf{u}) = \mathbf{u}\) for all \(\mathbf{u}\in U\) is linear, as is the zero map \(U \to V\) that sends every element of U to \(\mathbf{0}_V\).
  • Important! Let A be an m✕n matrix with entries in \(\FF\) and define \(T_A: \FF^n \to \FF^m\) by \(T_A(\mathbf{v})=A \mathbf{v}\). Then \(T_A\) is linear. Here \(\FF^n\) is the vector space of column vectors of height n with entries from \(\FF\). This example is important because it shows every matrix gives rise to a linear map. We will see later that every linear map can be represented as a matrix, linking the concrete linear algebra we did in the previous part of these notes with what we’re doing now.
  • Let \(f: \rr^3 \to \rr\) be \(f \begin{pmatrix} a\\b\\c \end{pmatrix} = a+b-2c\). Then f is linear. On the other hand, if \(f \begin{pmatrix} a\\b\\c \end{pmatrix} =\sqrt{a^2+b^2+c^2}\) then f is not linear: for example, for any nonzero \(\mathbf{x}\) we have \(f(- \mathbf{x} ) \neq -f( \mathbf{x})\).
  • Let V be the vector space of polynomials in one variable x over \(\FF\). Let \(D:V \to V\) be defined by \(D(f) = \frac{df}{dx}\). Then D is linear.

If \(f,g :U \to V\) are linear maps and \(\lambda\) is a scalar then \(\lambda f\) defined by \((\lambda f)(\mathbf{u})=\lambda f(\mathbf{u})\) and \(f+g\) is defined by \((f+g)(\mathbf{u}) = f(\mathbf{u})+g(\mathbf{u})\).
***

Lemma 3.10

  • Let \(f:U \to V\) and \(g:V \to W\) be linear maps. Then \(g\circ f:U \to W\) is a linear map.
  • Let \(f,g: U \to V\) be linear maps and \(\lambda,\mu \in \ff\). Then \(\lambda f+ \mu g\) is linear.

Proof.

  • For any \(\mathbf{x} , \mathbf{y} \in U\), \[\begin{align*} g(f( \mathbf{x}+ \mathbf{y}))&= g( f( \mathbf{x}) + f( \mathbf{y})) & \text{as } f \text{ is linear} \\ &= g(f( \mathbf{x} ))+ g(f( \mathbf{y} )) & \text{as } g \text{ is linear} \end{align*}\] so \(g\circ f\) meets the first condition for being linear. The second one is similar.
  • \(\lambda f + \mu g\) is the map that sends \(\mathbf{u} \in U\) to \(\lambda f(\mathbf{u}) + \mu g(\mathbf{u})\). For any \(\mathbf{x} , \mathbf{y} \in U\), \[\begin{align*} (\lambda f + \mu g)( \mathbf{x} + \mathbf{y} ) &= \lambda f( \mathbf{x} + \mathbf{y} ) + \mu g ( \mathbf{x} + \mathbf{y}) \\ &= \lambda f( \mathbf{x} ) + \lambda f( \mathbf{y} ) + \mu g( \mathbf{x} ) + \mu g( \mathbf{y} ) \\ &= \lambda f( \mathbf{x} ) + \mu g( \mathbf{x} ) + \lambda f ( \mathbf{y} )+ \mu g( \mathbf{y} ) \\ &= (\lambda f + \mu g)( \mathbf{x} ) + (\lambda f + \mu g) ( \mathbf{y} ) \end{align*}\] so \(\lambda f + \mu g\) meets the first condition for being linear, and the second condition is similar.

It turns out that these definitions of adding two linear maps and multiplying a linear map by a scalar make the set of linear maps from U to V into a vector space.


Lemma 3.11 If \(f:U \to V\) is linear then \(f(\mathbf{0}_U)=\mathbf{0}_V\).
Proof. \(f(\mathbf{0}_U)=f(\mathbf{0}_U+\mathbf{0}_U)=f(\mathbf{0}_U)+f(\mathbf{0}_U)\) so adding the additive inverse of \(f(\mathbf{0}_U)\) to both sides we get \(\mathbf{0}_V = f(\mathbf{0}_U)\).

Definition 3.10 Let \(f:U \to V\) be a linear map between \(\FF\)-vector spaces U and V.

  • The kernel of f, written \(\ker f\), is defined to be \[\begin{equation*} \{ \mathbf{u} \in U : f(\mathbf{u})=\mathbf{0}_V \}. \end{equation*}\]
  • The image of f, written \(\im f\), is defined to be \[\begin{equation*} \{f(\mathbf{u}) : \mathbf{u} \in U\}. \end{equation*}\]

Lemma 3.12 A linear map \(f:U \to V\) is one-to-one if and only if \(\ker f = \{\mathbf{0}_U\}\).

Proof. Suppose f is one-one and \(u \in \ker f\). Then \(f(\mathbf{u})=\mathbf{0}_V=f(\mathbf{0}_U)\) so \(\mathbf{u}=\mathbf{0}_U\).

Suppose \(\ker f=\{\mathbf{0}_U\}\) and \(f(\mathbf{x})=f(\mathbf{y})\). Then \(\mathbf{0}_V=f(\mathbf{x})-f(\mathbf{y})=f(\mathbf{x}-\mathbf{y})\) so \(\mathbf{x}-\mathbf{y} \in \ker f\) and therefore \(\mathbf{x}-\mathbf{y}=\mathbf{0}_U\), that is, \(\mathbf{x}=\mathbf{y}\).

Lemma 3.13 Let \(f:U \to V\) be a linear map between \(\FF\)-vector spaces. Then \(\ker f\) is a subspace of U and \(\im f\) is a subspace of V.

Proof. \(\mathbf{0}_U \in \ker f\) by Lemma 3.11 and \(\mathbf{0}_V = f(\mathbf{0}_U)\in \im f\) so both contain the appropriate zero vector.

Let \(\mathbf{x},\mathbf{y} \in \ker f\) and \(\lambda \in \FF\). Then \(f(\mathbf{x}+\mathbf{y})=f(\mathbf{x})+f(\mathbf{y})=\mathbf{0}_V+\mathbf{0}_V=\mathbf{0}_V\) and \(f(\lambda \mathbf{x})=\lambda f(\mathbf{x})=\lambda \mathbf{0}_V = \mathbf{0}_V\), so \(\ker f\) is closed under addition and under scalar multiplication.

Let \(f(\mathbf{x}),f(\mathbf{y}) \in \im f\) and \(\lambda \in \FF\). Then \(f(\mathbf{x})+f(\mathbf{y})=f(\mathbf{x}+\mathbf{y}) \in \im f\) and \(\lambda f(\mathbf{x}) = f(\lambda \mathbf{x}) \in \im f\) so \(\im f\) is closed under addition and scalar multiplication.

3.3.2 The matrix of a linear map

We’ll assume from now on that the vector spaces we work with are nonzero and finite-dimensional.

Let \(f: U \to V\) be a linear map and \(\mathcal{B} = \mathbf{b}_1,\ldots, \mathbf{b}_n\) a basis of U. Imagine that you want to communicate f to someone else. You only have to tell them \(f(\mathbf{b}_1), \ldots, f(\mathbf{b}_n)\), because they can then work out \(f(\mathbf{u})\) for any \(\mathbf{u} \in U\) by first expressing \(\mathbf{u}\) as a linear combination of the \(\mathbf{b}_i\) \[ \mathbf{u} = \sum_{i=1}^n \lambda_i \mathbf{b}_i\] and then using the linearity property to get \[ f(\mathbf{u}) = \sum_{i=1}^n \lambda_i f(\mathbf{b}_i).\] If the two of you agree a basis \(\mathcal{C} =\mathbf{c}_1,\ldots,\mathbf{c}_m\) of V then you can express each \(f(\mathbf{b}_j)\) as a linear combination of the \(\mathbf{c}_i\)s, so that you may as well only tell them the scalars involved in these linear combinations.

What this means is that a linear map \(f:U \to V\) is completely determined by the data of a basis \(\mathcal{B}\) of U, a basis \(\mathcal{C}\) of V, and the scalars needed to express the \(f(\mathbf{b}_i)\) in terms of the \(\mathbf{c}_j\). Organising these mn scalars into a rectangular grid gives a matrix representing f which depends on the choice of bases \(\mathcal{B}\) and \(\mathcal{C}\).

Definition 3.11 Let

  • \(f: U \to V\) be linear
  • \(\mathcal{B} = \mathbf{b}_1, \ldots, \mathbf{b}_n\) be a basis of U
  • \(\mathcal{C} = \mathbf{c}_1,\ldots \mathbf{c}_m\) be a basis of V

and define scalars \(a_{ij}\) by \(f(\mathbf{b}_j) = \sum_{i=1}^m a_{ij} \mathbf{c}_i\). Then the matrix of f with respect to initial basis \(\mathcal{B}\) and final basis \(\mathcal{C}\), written \([f]_{\mathcal{C}}^{\mathcal{B}}\), is the m × n matrix \((a_{ij})\).

This means that the \(j\)th column of this m × n matrix records the coefficients you need to express \(f(\mathbf{b}_j)\) as a linear combination of the \(\mathbf{c}_i\).

When we’re talking about a linear map from a vector space to itself we say the matrix of f with respect to \(\mathcal{B}\) to mean the matrix with respect to initial basis \(\mathcal{B}\) and final basis \(\mathcal{B}\).

Example 3.11

  • Let \(D: \RR[x]_{\leq 2} \to \RR[x]_{\leq 2}\) be the differentiation map. The vector space \(\RR[x]_{\leq 2}\) has a basis \(\mathcal{B} = 1,x,x^2\). We find the matrix of D with respect to initial and final bases \(\mathcal{B}\). Here is what D does to the elements of \(\mathcal{B}\): \[\begin{align*} D(1)&= 0 = 0\times 1 + 0 \times x + 0 \times x^2 \\ D(x) &= 1 = 1\times 1 + 0 \times x + 0 \times x^2\\ D(x^2) &= 2x = 0\times 1 + 2 \times x + 0 \times x^2 \end{align*}\] The first of these tells us the first column of \([D]^{\mathcal{B}}_{\mathcal{B}}\), and so on. We get \[\begin{equation*} [D]^{\mathcal{B}}_{\mathcal{B}}= \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{pmatrix} \end{equation*}\]
  • Let A be any m✕n matrix. Let \(T_A: \rr^n \to \rr^m\) be \(T_A( \mathbf{v} ) = A \mathbf{v}\). Let \(\mathcal{B}\) and \(\mathcal{C}\) be the standard bases of \(\rr^n\) and \(\rr^m\). Then \([ T_A]_ {\mathcal{C}} ^ {\mathcal{B}} = A\). This is because the jth column of \([T_A]^ {\mathcal{B}} _ {\mathcal{C}}\) is the representation of \(T_A( \mathbf{e} _j)\) with respect to the basis \(\mathcal{C}\), and \(T_A( \mathbf{e} _j) = A \mathbf{e} _j\) which equals the jth column of A.
  • Let \(T : \rr^3 \to \rr^3\) be the linear map defined by \[\begin{equation*} T \begin{pmatrix} x\\y\\z \end{pmatrix} = \begin{pmatrix} x+2y+3z \\ 4y+5z \\ 5y+6z \end{pmatrix} \end{equation*}\] We’ll find \([T]_ {\mathcal{B}} ^ {\mathcal{B}}\), where \(\mathcal{B}=( \mathbf{e} _1, \mathbf{e} _2, \mathbf{e} _3)\) is the standard basis of \(\rr^3\). To do that we have to work out how to express \(T( \mathbf{e} _j)\) as a linear combination of the standard basis elements, for each j. We have \[\begin{align*} T( \mathbf{e} _1) &= \begin{pmatrix} 1\\0\\0 \end{pmatrix} = \mathbf{e} _1 + 0\mathbf{e}_2 + 0\mathbf{e}_3 \\ T( \mathbf{e} _2) &= \begin{pmatrix} 2\\4\\5 \end{pmatrix} =2 \mathbf{e}_1 + 4 \mathbf{e} _2 + 5 \mathbf{e}_5 \\ T( \mathbf{e} _3) &= \begin{pmatrix} 3\\5\\6 \end{pmatrix} = 3 \mathbf{e} _1 + 5 \mathbf{e} _2 + 6 \mathbf{e}_3 \end{align*}\] and so \([T]_ {\mathcal{B}} ^ {\mathcal{B}} = \begin{pmatrix} 1&2&3\\0&4&5 \\0&5&6 \end{pmatrix}\).
Example 3.12 The matrix of the identity map \(\id : V \to V\) is the identity matrix if the initial and final bases chosen are the same, and the matrix of the zero map from U to V with respect to any initial and final bases is the appropriately-sized zero matrix.

Matrix multiplication is defined the way it is so that it corresponds to composition of linear maps.

Proposition 3.3 Let \(f:U \to V\) and \(g:V \to W\) be linear maps, let \(\mathcal{B}=(\mathbf{b}_1,\ldots,\mathbf{b}_l)\) be a basis of U, let \(\mathcal{C}=(\mathbf{c}_1,\ldots,\mathbf{c}_m)\) a basis of V and let \(\mathcal{D}=(\mathbf{d}_1,\ldots,\mathbf{d}_n)\) be a basis of W. Then \[[g\circ f]^{\mathcal{B}}_{\mathcal{D}} = [g]^{\mathcal{C}}_{\mathcal{D}}[f]^{\mathcal{B}}_{\mathcal{C}}.\]
Proof. To work out the matrix of \(g\circ f\) on the left hand side, we must find \(g(f(\mathbf{b}_j))\) as a linear combination of the \(\mathbf{d}_i\): the coefficient of \(\mathbf{d}_i\) in \(g(f(\mathbf{b}_j))\) is the i, j entry of the matrix. So let’s do that: \[\begin{equation*} g(f(\mathbf{b}_j)) = g\left( \sum_{k=1}^m f_{kj} \mathbf{c}_k \right) = \sum_{k=1}^m f_{kj}g(\mathbf{c}_k) =\sum_{k=1}^m \sum_{i=1}^n f_{kj}g_{ik}\mathbf{d}_i \end{equation*}\] where \(f_{kj}\) is the k, j entry of \([f]^{\mathcal{B}}_{\mathcal{C}}\) and \(g_{ik}\) is the i, j entry of \([g]^{\mathcal{C}}_{\mathcal{D}}\). So the i, j entry of \([g\circ f]^{\mathcal{B}}_{\mathcal{D}}\), that is, the coefficient of \(\mathbf{d}_i\) in the above, is \(\sum_{k=1}^m g_{ik}f_{kj}\) which is, by definition of matrix multiplication, the i, j entry of \([g]^{\mathcal{C}}_{\mathcal{D}}[f]^{\mathcal{B}}_{\mathcal{C}}\).

We can take the matrix of a linear map f with respect to different choices of initial and final bases. The next result, called the change of basis formula, tells you how the resulting matrices are related.

Corollary 3.3 Let \(f:U \to V\) be a linear map and let \(\mathcal{B}, \mathcal{B}'\) be bases of U and \(\mathcal{C}, \mathcal{C}'\) be bases of V. Then \[\begin{equation*} [f]^{\mathcal{B}'}_{\mathcal{C}'} = [\id_V]^{\mathcal{C}}_{\mathcal{C}'} [f]^{\mathcal{B}}_ {\mathcal{C}} [\id_U]^{\mathcal{B'}}_{\mathcal{B}} \end{equation*}\]
Proof. \(f = \id_V \circ f \circ \id_U\), so using Proposition 3.3 twice: \[\begin{equation*} [f]^{\mathcal{B'}}_{\mathcal{C}'} = [\id_V \circ f \circ \id_U]^{\mathcal{B'}}_{\mathcal{C}'} = [\id_V]^{\mathcal{C}}_{\mathcal{C}'}[f\circ \id_U]^{\mathcal{B}'}_{\mathcal{C}} = [\id_V]^{\mathcal{C}}_{\mathcal{C}'} [f]^{\mathcal{B}}_{\mathcal{C}} [\id_U]^{\mathcal{B}'}_{\mathcal{B}}. \end{equation*}\]

Example 3.13 Let \(T: \mathbb{R}^2 \to \mathbb{R}^2\) be the linear map \[\begin{equation*} T \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} -16x+6y \\ -45x+17y \end{pmatrix} \end{equation*}\] Let \(\mathcal{B}\) be the standard basis \(\mathbf{e}_1, \mathbf{e}_2\) of \(\mathbb{R}^2\) and let \(\mathcal{C}\) be the basis \(\mathbf{c}_1 = \begin{pmatrix} 2 \\ 5\end{pmatrix}, \mathbf{c}_2 = \begin{pmatrix} 1 \\ 3 \end{pmatrix}\).

The matrix of T with respect to \(\mathcal{B}\) is easy to find: \[\begin{align*} T( \mathbf{e}_1) &= T \begin{pmatrix}1\\0\end{pmatrix} \\ &= \begin{pmatrix} -16 \\ -45 \end{pmatrix} \\ &= -16 \mathbf{e}_1 -45 \mathbf{e}_2 \\ T( \mathbf{e}_2) &= T \begin{pmatrix}0\\1\end{pmatrix} \\ &= \begin{pmatrix} 6 \\ 17 \end{pmatrix} \\ &= 6 \mathbf{e}_1 +17 \mathbf{e}_2 \end{align*}\] so \([T]^{\mathcal{B}}_{\mathcal{B}} = \begin{pmatrix} -16 & 6 \\ -45 & 17 \end{pmatrix}\).

\([\id]^{\mathcal{C}}_{\mathcal{B}}\) is also easy: it’s the matrix which tells us how to express the elements of \(\mathcal{C}\) in terms of the standard basis. \[\begin{align*} \mathbf{c}_1 &= 2 \mathbf{e}_1 + 5 \mathbf{e}_2 \\ \mathbf{c}_2 &= 1 \mathbf{e}_1 + 3 \mathbf{e}_2 \end{align*}\] and so \([\id]^{\mathcal{C}}_{\mathcal{B}} = \begin{pmatrix} 2&1\\5&3\end{pmatrix}\).

How to express the \(\mathbf{e}_i\) in terms of the \(\mathbf{c}_i\) isn’t so obvious, so on the face of it computing \([\id]^{\mathcal{B}}_{\mathcal{C}}\) is harder. But we can avoid that by using proposition 3.3: \([\id]^{\mathcal{B}}_{\mathcal{C}} [\id]^{\mathcal{C}}_{\mathcal{B}} = [\id]^{\mathcal{C}}_{\mathcal{C}} = [\id]^{\mathcal{C}}_{\mathcal{C}}=I_2\), so \[\begin{align*} [\id]_{\mathcal{C}}^{\mathcal{B}} &= ([\id]^{\mathcal{C}}_{\mathcal{B}}) ^{-1} \\ &= \begin{pmatrix} 2&1\\5&3\end{pmatrix}^{-1} \\ &= \begin{pmatrix} 3&-1\\-5&2\end{pmatrix} \end{align*}\]

We could work out \([T]^{\mathcal{C}}_{\mathcal{C}}\) directly using the definition, but instead we are going to practise using the change of basis formula. It says \[\begin{align*} [T]^{\mathcal{C}}_{\mathcal{C}} &= [\id]^{\mathcal{B}}_{\mathcal{C}} [T]^{\mathcal{B}}_{\mathcal{B}} [\id]^{\mathcal{C}}_{\mathcal{B}} \\ &= \begin{pmatrix} 3&-1 \\ -5 & 2 \end{pmatrix} \begin{pmatrix} -16 & 6 \\ -45 & 17 \end{pmatrix} \begin{pmatrix} 2&1 \\5&3\end{pmatrix} \\ &= \begin{pmatrix} -1 & 0 \\ 0 & 2\end{pmatrix} \end{align*}\]

What about \([T]^{\mathcal{B}}_{\mathcal{C}}\)? Again we could do this directly from the definition by computing \(T(\mathbf{e}_1)\) and \(T(\mathbf{e}_2)\) and expressing them in terms of the \(\mathbf{c}_i\)s. But we already have the information we need: by proposition 3.3 \[\begin{align*} [T]^{\mathcal{B}}_{\mathcal{C}} &= [T\circ \id]^{\mathcal{B}}_{\mathcal{C}} \\ &= [T]^{\mathcal{C}}_{\mathcal{C}} [\id]^{\mathcal{B}}_{\mathcal{C}} \\ &= \begin{pmatrix} -1&0\\0&2\end{pmatrix} \begin{pmatrix} 3&-1\\-5&2\end{pmatrix} \\ &= \begin{pmatrix} -3&1 \\ -10 & 4 \end{pmatrix} \end{align*}\] To check our answer we compute \(T(\mathbf{e}_1)\), which is \(\begin{pmatrix} -16 \\ -45 \end{pmatrix}\). If the matrix is correct this should be the same as \(-3 \mathbf{c}_1 - 10 \mathbf{c}_2\), and you can check that it really is.


Here are some more properties of the matrix of a linear map.

Proposition 3.4 Let \(f,g: U \to V\) be linear maps and \(\mu,\lambda \in \ff\). Let \(\mathcal{B}\) be a basis of U and \(\mathcal{C}\) a basis of V.

  • \([\lambda f + \mu g] ^ {\mathcal{B}} _ {\mathcal{C}} = \lambda [f] ^ {\mathcal{B}} _ {\mathcal{C}} + \mu [g]^ {\mathcal{B}} _ {\mathcal{C}}\).
  • If f is invertible then \([f^{-1}]^ {\mathcal{C}} _ {\mathcal{B}} = ( [f]^ {\mathcal{B}} _ {\mathcal{C}} )^{-1}\).

Proof.

  • Let \(\mathcal{B}\) be \(\mathbf{b} _1,\ldots, \mathbf{b} _n\). This follows because the jth column of \([ \lambda f + \mu g]_ {\mathcal{C}} ^ {\mathcal{B}}\) is \([ (\lambda f+\mu g)( \mathbf{b}_j]_ {\mathcal{C}} = [ \lambda f( \mathbf{b} _j) + \mu g( \mathbf{b} _j)]_ {\mathcal{C}} = \lambda [f( \mathbf{b} _j) ]_ {\mathcal{C}} + \mu [g( \mathbf{b} _j)]_ {\mathcal{C}}\), which is the same as the jth column of \(\lambda [f] ^ {\mathcal{B}} _ {\mathcal{C}} + \mu [g] ^ {\mathcal{B}} _ {\mathcal{C}}\).
  • By the previous proposition, \([\id_W] ^ {\mathcal{C}} _ {\mathcal{C}}= [f \circ f^{-1}]^ {\mathcal{C}} _ {\mathcal{C}} = [f]^ {\mathcal{B}} _ {\mathcal{C}} [f^{-1}] ^ {\mathcal{C}} _ {\mathcal{B}}\). But \([\id_W] ^ {\mathcal{C}} _ {\mathcal{C}} =I_n\), where \(n = \dim W\). So \([f] ^ {\mathcal{B}} _ {\mathcal{C}} [f^{-1}] ^ {\mathcal{C}} _ {\mathcal{B}}= I_n\), which gives the result we want.

3.3.3 The rank-nullity theorem

Definition 3.12 Let U and V be finite-dimensional \(\FF\)-vector spaces and \(f:U \to V\) be a linear map. Then

  • the rank of f, written \(\rk f\), is \(\dim \im f\)
  • the nullity of f , written \(\nul f\), is \(\dim \ker f\).
Theorem 3.2 (Rank-nullity theorem) Let U and V be finite-dimensional \(\FF\)-vector spaces and \(f:U \to V\) be a linear map. Then \[\dim \im f + \dim \ker f = \dim U.\]

Proof. Let \(\mathbf{k} _1,\ldots, \mathbf{k} _m\) be a basis of \(\ker f\). Using Proposition 3.1, extend it to a basis \(\mathbf{k} _1,\ldots, \mathbf{k} _m, \mathbf{u} _1,\ldots, \mathbf{u}_n\) of U. Then \(\dim \ker f= m\), \(\dim U = m+n\) and so we need to prove that \(\dim \im f = n\). To do this we show that \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) is a basis of \(\im f\).

First we show that \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) spans \(\im f\). If \(\mathbf{x} \in U\) then we can write \[\mathbf{x} = \sum_{i=1}^m \lambda_i \mathbf{k} _i + \sum_{i=1}^n \mu_i \mathbf{u} _i \] for some scalars \(\lambda_i,\mu_i\) since \(\mathbf{k} _1,\ldots, \mathbf{k} _m, \mathbf{u} _1,\ldots , \mathbf{u} _n\) is a basis of U. Then \[\begin{multline*} f(\mathbf{x})=f\left( \sum_{i=1}^m \lambda_i \mathbf{k} _i + \sum_{i=1}^n \mu_i \mathbf{u} \right) =\sum_{i=1}^m \lambda_i f(\mathbf{k} _i) + \sum_{i=1}^n \mu_i (\mathbf{u}_i) =\sum_{i=1}^n \mu_i f(\mathbf{u}_i) \end{multline*}\] because \(f(\mathbf{k}_i)=\mathbf{0}_V\) for all i. This shows that any element \(f(\mathbf{x})\) of the image of f is in the span of the \(f(\mathbf{u}_i)\).

Now we show \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) is linearly independent. Suppose \[\sum_{i=1}^n \lambda_i f(\mathbf{u}_i)=\mathbf{0}_V\] for some scalars \(\lambda_i\); we want to prove all these scalars must be zero. By linearity \(f\left(\sum_{i=1}^n \lambda_i\mathbf{u}_i\right)=\mathbf{0}_V\) and so \(\sum _{i=1}^n \lambda_i \mathbf{u}_i \in \ker f\). But \(\mathbf{k} _1,\ldots, \mathbf{k} _m\) is a basis of the kernel of f, so there exist scalars \(\mu_i\) such that \[\begin{equation*} \sum_{i=1}^n \lambda_i \mathbf{u}_i = \sum_{i=1}^m \mu_i \mathbf{k}_i \end{equation*}\] or equivalently \[\begin{equation*} \sum_{i=1}^n \mu_i \mathbf{k}_i-\sum_{i=1}^m \lambda_i \mathbf{u}_i=\mathbf{0}_U. \end{equation*}\] This is a linear dependence relation on \(\mathbf{k} _1,\ldots, \mathbf{k _m}, \mathbf{u} _1,\ldots, \mathbf{u}_n\), but those vectors are a basis of U hence linearly independent. Thus all the \(\lambda_i\) and \(\mu_i\) are zero, completing the proof that \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) is linearly independent.

You should compare the next result about finite-dimensional vector spaces to the pigeonhole principle for finite sets.


Corollary 3.4 Let U and V be finite-dimensional \(\FF\)-vector spaces with \(\dim U = \dim V\), and let \(f:U \to V\) be a linear map. Then the following are equivalent:

  • f is a bijection.
  • f is one-to-one.
  • f is onto.

Proof. Suppose f is one-to-one. Then \(\ker f = \{\mathbf{0}_U\}\), so by the rank-nullity theorem \(\dim \im f = \dim V\). Thus \(\im f\) is a subspace of V with the same dimension as V, so by Proposition 3.2 part 2, \(\im f = V\) and f is onto.

Suppose f is onto, so \(\im f = V\). Then \(\dim \im f = \dim V = \dim U\) so by rank-nullity, \(\dim \ker f = 0\) and \(\ker f = \{\mathbf{0}_U\}\). So f is one-to-one and is therefore a bijection.

Lastly, f being a bijection certainly implies it is one-to-one. We’ve shown 1) implies 2) implies 3) implies 1) therefore they are all equivalent.

A linear map \(f:U \to V\) which is a bijection is called an isomorphism of vector spaces or just an isomorphism. If there exists an isomorphism from U to V we say that U and V are isomorphic and write \(U \cong V\).

3.3.4 Eigenvalues and eigenvectors

We defined eigenvalues and eigenvectors for square matrices in the last part of these notes. Now we will define them for linear maps from a vector space to itself. Because we know how to turn a n✕n matrix into a linear map \(\ff^n \to \ff^n\) (Example 3.10 part 2) this will generalize the old definition for matrices.

Definition 3.13 Let \(f:V \to V\) be a linear map. An eigenvector for f is a nonzero \(\mathbf{v} \in V\) such that there exists \(\lambda \in \ff\) with \(f( \mathbf{v} )= \lambda \mathbf{v}\). In this case we say that \(\mathbf{v}\) is a \(\lambda\)-eigenvector of f, and that \(\lambda\) is an eigenvalue of f.


Example 3.14

  • Let A be a n✕n matrix. Let \(T_A: \ff^n \to \ff^n\) be \(T_A( \mathbf{v} )= A \mathbf{v}\). Then a column vector \(\mathbf{v}\) is an eigenvector for \(T_A\) in the sense defined above if and only if it is an eigenvector for A in the sense defined in the previous part of these notes.
  • Let V be the real vector space of all polynomials in one variable x with degree at most n with real coefficients, and let \(D:V \to V\) be \(D(f)= \frac{df}{dx}\), the differentiation map. Then D is a linear map. Any nonzero constant polynomial c is an 0-eigenvector of D, because \[\begin{equation*} D(c) = 0 = 0 \times c \end{equation*}\] Any nonconstant polynomial f is an eigenvector of D, because \(D(f)\) has strictly smaller degree than f so cannot be a scalar multiple of f.
  • Let V be the real vector space of all differentiable functions \(\rr \to \rr\), and let \(D:V \to V\) be \(D(f)=f'\) be the differentiation map again. The function \(f(x)=e^{\lambda x}\) is a \(\lambda\)-eigenvector of D.
  • Let \(M_{2\times 2}(\rr)\) be the real vector space of all 2✕2 matrices with real entries. Let \(\operatorname{tr}:M_{2\times 2}(\rr) \to M_{2\times 2}(\rr)\) be the map \(\operatorname{tr}(A) = A^T\), the transpose of A. The matrices \[\begin{pmatrix} 0&1\\1&0 \end{pmatrix} \text{ and } \begin{pmatrix} 0&1 \\ -1 & 0 \end{pmatrix} \] are a 1-eigenvector and a \(-1\)-eigenvector of \(\operatorname{tr}\) respectively. Can you find some more 1-eigenvectors?
  • Let \(\id_V:V \to V\) be the identity map. Then any nonzero \(\mathbf{v}\in V\) is an eigenvector for \(\id_V\) with eigenvalue 1.

Proposition 3.5 Let \(T:V \to V\) be linear and let \(\mathbf{v} _1,\ldots, \mathbf{v} _n\) be eigenvectors of T with eigenvalues \(\lambda_1,\ldots, \lambda _n\). Then \(\mathbf{v} _1,\ldots, \mathbf{v}_n\) are linearly independent.

Slogan version: eigenvectors with different eigenvalues are linearly independent.


Proof. By contradiction. Choose a counterexample with n as small as possible. Note that n must be bigger than 1 in this counterexample, as eigenvectors are nonzero so an eigenvector \(\mathbf{v} _1\) on its own is linearly independent.

A counterexample is a linear dependence relation \[\begin{equation} \tag{3.3} \sum _{i=1}^n a_i \mathbf{v} _i= \mathbf{0}_V. \end{equation}\] with not all \(a_i=0\). Apply the linear map \(T-\lambda_n \id_V\) to both sides. We have \[(T-\lambda _n\id_V) \mathbf{v} _i = T( \mathbf{v} _i)-\lambda _n \mathbf{v}_i = \lambda _i \mathbf{v} _i - \lambda _n \mathbf{v} _i = (\lambda _i-\lambda _n) \mathbf{v} _i\] and so (3.3) becomes \[\sum_{i=1}^n a_i (\lambda_i-\lambda_n) \mathbf{v} _i = \mathbf{0} _V.\] The \(i=n\) term is 0, so we can rewrite this as \[\begin{equation*} \sum _{i=1}^{n-1} a_i (\lambda_i-\lambda_n) \mathbf{v} _i = \mathbf{0}_V. \end{equation*}\] This is a shorter linear dependence relation, so all the coefficients must be 0. Since \(\lambda_i-\lambda _n \neq 0\) for \(i<n\) we must have \(a_i=0\) for \(i<n\). Then (3.3) just says \(a_n \mathbf{v} _n= \mathbf{0}_V\), so \(a_n=0\) too. We’ve shown all the \(a_i\) in (3.3) were zero, a contradiction.


Definition 3.14

  • A linear map \(T:V \to V\) is called diagonalizable if and only if there is a basis of V consisting of eigenvectors of T.
  • An n✕n matrix is called diagonalizable if and only if there is a basis of \(\ff^n\) consisting of eigenvectors of A.

This means A is diagonalizable if and only if \(T_A:\ff^n \to \ff^n\) is diagonalizable.

Diagonalizable linear maps \(T:V \to V\) are the easiest linear maps to understand, because they behave in a simple way: V has a basis \(\mathbf{v} _1,\ldots, \mathbf{v}_n\) say, consisting of eigenvectors of T, and all T does is to multiply each \(\mathbf{v} _i\) by a certain scalar.

Example 3.15

  • The zero map \(V \to V\) is diagonalizable. Any basis of V is a basis consisting of 0-eigenvectors for this map.
  • The identity map \(V \to V\) is diagonalizable. Any basis of V is a basis consisting of 1-eigenvectors for this map. For the same sort of reason, any multiple \(\lambda \id_V\) is diagonalizable too
  • Let \(n>1\) and let V be the space of polynomials of degree at most n in one variable x over the real numbers. Let \(D:V \to V\) be the differentiation map. We’ve already seen that every eigenvector of D is a constant polynomial, so there cannot be a basis of V consisting of eigenvectors of D which is therefore not diagonalizable.
  • The transpose map \(\operatorname{tr}: M_{2\times 2}(\rr) \to M_{2\times 2}(\rr)\) is diagonalizable. First note that we have \(\tr (\tr(A))=A\). It follows that if \(\lambda\) is an eigenvalue of \(\tr\) then \(\lambda^2=1\): if A is a \(\lambda\)-eigenvector then \(A= \tr(\tr(A)) = \tr(\lambda A) = \lambda \tr(A) = \lambda ^2 A\). So \(\lambda=\pm 1\). A 1-eigenvector is a matrix which is its own transpose, e.g. \[\begin{pmatrix} 1&0\\0&0 \end{pmatrix} , \begin{pmatrix} 0&0\\0&1 \end{pmatrix} , \begin{pmatrix} 0&1 \\1&0 \end{pmatrix}.\] The matrix \[\begin{pmatrix} 0& 1 \\ -1 & 0 \end{pmatrix}\] is a \(-1\)-eigenvector for \(\tr\). You can check that these four eigenvectors are linearly independent, so they form a basis of the four-dimensional vector space \(M_{2\times 2}(\rr)\), so \(\tr\) is diagonalizable.

A diagonal matrix is a square matrix of the form \[\begin{equation*} \begin{pmatrix} d_{11} & 0 & 0 & \cdots & 0 \\ 0 & d_{11} & 0 & \cdots & 0 \\ 0 & 0 & \ddots & & \vdots \\ \vdots & \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \cdots & 0 \end{pmatrix} \end{equation*}\] that is, a matrix \((d_{ij})\) where \(d_{ij}=0\) if \(i \neq j\). We write \(\diag( a_1,\ldots,a_n)\) for the n✕n diagonal matrix whose \((1,1),(2,2),\ldots\) entries are \(a_1,a_2,\ldots\)

If \(T:V \to V\) is diagonalizable and \(\mathcal{B} = \mathbf{v} _1,\ldots, \mathbf{v}_n\) is a basis of V in eigenvectors of T then its matrix with respect to \(\mathcal{B}\) is diagonal: \[\begin{equation*} [T]_ {\mathcal{B}} ^ {\mathcal{B}} = \diag ( \lambda_1,\ldots,\lambda_n) \end{equation*}\] where \(\lambda_i\) is the eigenvalue of \(\mathbf{v} _i\).

Proposition 3.6 If \(\dim V=n\) and \(T:V \to V\) has n distinct eigenvalues then T is diagonalizable.
Proof. Let the distinct eigenvalues be \(\lambda_1,\ldots,\lambda_n\). Let \(\mathbf{v_i}\) be a \(\lambda_i\)-eigenvector. Then the \(\mathbf{v} _i\) form a basis of V, since they are linearly independent by Proposition 3.5 and there are \(n=\dim V\) of them so they span a subspace of V with dimension \(\dim V\) and which is therefore all of V.

It is important to remember that the converse of this result is false. A diagonalizable linear map does not have to have \(\dim V\) distinct eigenvalues. Consider the identity map \(V \to V\): it only has one eigenvalue, namely 1, but it is diagonalizable: any basis of V is a basis consisting of 1-eigenvectors of the identity map.

Proposition 3.7 Let A be a diagonalizable matrix. Let \(\mathbf{v} _1,\ldots, \mathbf{v}_n\) be a basis of \(\ff^n\) consisting of eigenvectors of A. Let P be the matrix whose columns are the \(\mathbf{v}_i\). Then \[\begin{equation*} P^{-1}AP =D \end{equation*}\] where D is the diagonal matrix whose ith diagonal entry is the eigenvalue of \(\mathbf{v} _i\).
Proof. Let \(\mathcal{B}\) be the standard basis of \(\ff^n\) and \(\mathcal{C}\) the basis \(\mathbf{v}_1,\ldots, \mathbf{v} _n\). P is the matrix of the identity map with respect to initial basis \(\mathcal{C}\) and final basis \(\mathcal{B}\) that is, \(P=[\id]^{\mathcal{C}}_{\mathcal{B}}\). By Proposition 3.3, \[\begin{equation*} I_n= [\id]_ {\mathcal{B}} ^ {\mathcal{B}} = [\id]_ {\mathcal{B}} ^ {\mathcal{C}} [\id]_ {\mathcal{C}} ^ {\mathcal{B}} = P [\id]_ {\mathcal{C}}^ {\mathcal{B}} . \end{equation*}\] So P is invertible with inverse \(P^{-1}=[\id]_ {\mathcal{C}} ^ {\mathcal{B}}\). By the change of basis formula, \[\begin{equation*} [T_A]_ {\mathcal{B}} ^ {\mathcal{B}} [\id] _ {\mathcal{B}} ^ {\mathcal{C}} = [\id]^ {\mathcal{C}} _ {\mathcal{B}} [T_A]^ {\mathcal{C}} _ {\mathcal{C} } \end{equation*}\] Since \([T_A]^ {\mathcal{B}} _ {\mathcal{B}} =A\) and \([T_A]^ {\mathcal{C}} _ {\mathcal{C}}=D\) this says \(AP=PD\), which is equivalent to what we wanted to prove.

One reason this is useful is in finding powers of matrices. It’s very easy to find powers of diagonal matrices, for example, \[\begin{equation*} \begin{pmatrix} a & 0 \\ 0 & b \end{pmatrix} ^n = \begin{pmatrix} a^n & 0 \\ 0 & b^n \end{pmatrix} . \end{equation*}\] If we have a diagonalizable matrix A there is a basis of \(\ff^n\) consisting of eigenvectors of A, and we can form a matrix P whose columns are the elements of this basis. Then \(P^{-1}AP=D\) for a diagonal matrix D, so \(A=PDP^{-1}\), so \[\begin{align*} A^n &= (PDP^{-1})^n \\ &= PDP^{-1}PDP^{-1}\cdots PDP^{-1}PDP^{-1} \\ &= PDI_n D I_n \cdots I_n D I_n P^{-1} \\ &= PDD\cdots D P^{-1}\\ &= PD^n P^{-1} \end{align*}\] Since \(D^n\) is easy to find, this gives a method of computing a closed formula for \(A^n\).

Example 3.16 Let \(A = \begin{pmatrix} 1 & 1 \\ 1&1 \end{pmatrix}\). We’ve seen that the characteristic polynomial of A is \(x(x-2)\), so A has two distinct eigenvalues 0 and 2, so A is diagonalizable. If we find a 0-eigenvector and a 2-eigenvector they must be linearly independent (as they are eigenvectors with different eigenvalues), so they will form a basis of \(\rr^2\). It turns out that \(\begin{pmatrix} 1\\1 \end{pmatrix}\) is a 2-eigenvector and \(\begin{pmatrix} 1\\-1 \end{pmatrix}\) is a 0-eigenvector. Let P be the matrix with these eigenvectors as columns, so \(P = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}\). Then \(P^{-1}AP\) is a diagonal matrix with the eigenvalues of A on the diagonal: \[\begin{equation*} P^{-1}AP = \begin{pmatrix} 2&0 \\ 0 & 0 \end{pmatrix} . \end{equation*}\]