2.3 Linear equations and row operations

A system of linear equations in the variables \(x_1,\ldots,x_m\) is a list of simultaneous equations \[\begin{align} \tag{2.1} a_{11}x_1 + a_{12} x_2 + \cdots + a_{1m}x_m &= b_1 \\ a_{21}x_1 + a_{22} x_2 + \cdots + a_{2m}x_m &= b_2 \\ \vdots \;\;\;\;\;\;\;\;\; &\;\; \vdots \\ a_{n1}x_1 + a_{n2} x_2 + \cdots + a_{nm}x_m &= b_n \end{align}\] where the \(a_{ij}\) and \(b_i\) are numbers.

If we let A be the matrix \((a_{ij})\), \(\mathbf{b}\) the column vector \(\begin{pmatrix} b_1\\ \vdots \\ b_n \end{pmatrix}\), and \(\mathbf{x}\) the column vector \(\begin{pmatrix} x_1 \\ \vdots \\ x_m \end{pmatrix}\) then we can express (2.1) by saying

\[ \tag{2.2} A \mathbf{x} = \mathbf{b}. \]

A solution of this system is a list of values for the \(x_i\)s such that when we substitute them into (2.2), the equation holds. A system of linear equations is consistent if it has a solution, otherwise it is inconsistent.

When we want to find solutions to a system of linear equations, we usually go about it by adding or subtracting multiples of one equation from another in order to eliminate variables one by one. These operations on equations correspond to row operations on the matrix form (2.2) of the equations.

Definition 2.8 A row operation, or row op, is one of the following procedures we can apply to a matrix.

multiply each entry in row i by the number \(\lambda \neq 0\), written \(r_i \mapsto \lambda r_i\).
swap rows i and j, written \(r_i \leftrightarrow r_j\)
add \(\lambda\) times row i to row j, where \(i \neq j\), written \(r_j \mapsto r_j + \lambda r_i\).

Sometimes these are called elementary row operations, but we’ll just call them row operations. If r is a row operation and A a matrix we write r(A) for the result of applying r to A.

Example 2.1 Let A be the matrix \(\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\). Then if r if \(r_1 \mapsto 2 r_2\), s is \(r_1 \leftrightarrow r_2\), and t is \(r_2 \mapsto r_2 - 3 r_2\), \[\begin{align*} r (A) &= \begin{pmatrix} 2 & 4 \\ 3 & 4 \end{pmatrix}\\ s(A) &= \begin{pmatrix} 3 & 4 \\ 1 & 2 \end{pmatrix} \\ t(A) &= \begin{pmatrix} 1 & 2 \\ 0 & -2 \end{pmatrix}.\end{align*}\]

Lemma 2.2 All row operations are invertible. That is, if r is a row operation then there is another row operation such that \(r(s(A))=s(r(A))=A\) for all matrices A.

Proof. A row swap \(r_i \leftrightarrow r_j\) is its own inverse, \(r_i \mapsto \lambda r_i\) has inverse \(r_i\mapsto \lambda^{-1} r_i\) and \(r_i \mapsto r_i+\lambda r_j\) has inverse \(r_i \mapsto r_i - \lambda r_j\).

Definition 2.9 The augmented matrix associated to a system of linear equations written in matrix form \(A \mathbf{x}=\mathbf{b}\) is the matrix obtained by adding \(\mathbf{b}\) as an extra column on the right of A.

The point of introducing the augmented matrix is that doing a row operation to the augmented matrix of a system corresponds to manipulating the system of linear equations \(A \mathbf{x} = \mathbf{b}\) in a way that doesn’t change the solutions.

Often people write \((A | \mathbf{b})\) or put a dotted line before the last column of an augmented matrix to emphasise where it came from.

Example 2.2 The augmented matrix associated to the system \[\begin{align*} 2x+3y &= 0 \\ 4x-7y &= 2 \end{align*}\] is \[ \begin{pmatrix} 2 & 3 & 0 \\ 4 & - 7 & 2 \end{pmatrix} \]

Suppose we have a matrix equation \(A \mathbf{x}=\mathbf{b}\), where \(\mathbf{x}\) is a matrix of indeterminates. A solution \(\mathbf{v}\) of this matrix equation is defined to be a column vector of numbers such that \(A \mathbf{v}=\mathbf{b}\). We want to show that doing row ops to an augmented matrix leaves the set of solutions unchanged - to do this we need to link row ops and matrix multiplication.

Definition 2.10 An elementary matrix is one that results from doing a single row operation to an identity matrix.

The next proposition shows that doing a row operation to a matrix has the same effect as multiplying it by an elementary matrix.

Proposition 2.3 Let r be a row operation and A an m✕n matrix. Then \(r(A)=r(I_m)A\).

While reading the proof it helps to keep some example elementary matrices in mind. If r is \(r_2 \mapsto 3r_2\), s is \(r_2 \leftrightarrow r_3\), and t is \(r_3 \mapsto r_3 + 4r_2\) then \[\begin{align*} r(I_4)&=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \\ s(I_4) &=\begin{pmatrix} 1&0&0&0\\ 0&0&1&0\\ 0&1&0&0\\ 0&0&0&1 \end{pmatrix} \\ t(I_4) &= \begin{pmatrix} 1&0&0&0\\ 0&1&0&0\\ 0&4&1&0\\ 0&0&0&1\end{pmatrix} \end{align*}\]

Proof. It’s enough to check the case when \(A=\begin{pmatrix} a_1\\ \vdots \\ a_m \end{pmatrix}\) is a column vector because matrix multiplication works column-by-column. There are three cases according to the three types of row operation. In each case we will work out \(r(I_m)A\) and show that it is the same as \(r(A)\).

For any l, the lth entry of \(r(I_m)A\) is

\[\begin{equation} \tag{2.3} \sum_{k=1}^m s_{lk}a_k \end{equation}\]

where \(s_{lk}\) is the l, k entry of \(r(I_m)\).

r is \(r_i\mapsto \lambda r_i\). For \(l\neq i\), the lth row of \(r(I_m)\) is the same as the lth row of \(I_m\), so \(s_{lk}=0\) unless \(k=l\) in which case it is 1. So from (2.3), the lth entry of \(r(I_m)A\) is \(a_l\). If \(l=i\) then \(s_{lk}=\lambda\) if \(k=l\) and 0 otherwise, so by (2.3) the ith entry of \(r(I_m)A\) is \(\lambda a_i\). Therefore \(r(I_m)A\) is the same as A, except that the ith entry is multiplied by \(\lambda\). This is \(r(A)\).
r is \(r_i \leftrightarrow r_j\). There are three cases for l.
- \(l\neq i,j\). In this case \(s_{lk}=0\) unless \(k=l\) in which case it is 1, so from (2.3) the sum equals \(a_r\).
- \(l=i\). Since \(r_{ij}(I_m)\) swaps rows i and j of \(I_m\), \(s_{ik}=1\) if \(k=j\) and 0 otherwise. So the sum is \(a_j\).
- \(l=j\). Similar to the last case, the sum equals \(a_i\). Therefore \(r(I_m)A\) is the same as A, except in row i where \(a_j\) appears and row j where \(a_i\) appears. This is \(r(A)\).
r is \(r_i \mapsto r_i + \lambda r_j\). This time the rows of \(r(I_m)\) are the same as the rows of \(I_m\), except the ith which has i, i entry 1 and i, j entry \(\lambda\), so \(s_{ii}=1, s_{ij}=\lambda\), and all other \(s_{ik}=0\). From (2.3), every entry of \(r(I_m)A\) is the same as the corresponding entry of A except the ith, where we get \(a_i+\lambda a_j\), and again, \(r(I_m)A\) is the same as \(r(A)\).

Corollary 2.1 Elementary matrices are invertible.

Proof. Each row operation r has an inverse \(r^{-1}\) by Lemma 2.2. Then

\[ r(I_m) r^{-1}(I_m) = r(r^{-1}(I_m)) \]

by Proposition 2.3, and this equals \(I_m\). Similarly \(r^{-1}(I_m)r(I_m)=I_m\), so the elementary matrix \(r(I_m)\) is invertible with inverse \(r^{-1}(I_m)\).

Now we can show that doing row ops to an augmented matrix doesn’t change the set of solutions of the corresponding system of linear equations.

Proposition 2.4 Let \(A \mathbf{x} = \mathbf{b}\) be a system of linear equations in matrix form. Let r be a row op and let \((A' | \mathbf{b}')\) be the result of applying r to the augmented matrix \((A | \mathbf{b})\). Then a vector \(\mathbf{v}\) is a solution of \(A \mathbf{x}=\mathbf{b}\) if and only if it is a solution of \(A' \mathbf{x}= \mathbf{b}'\).

Proof. Doing the row op r to a matrix is equivalent, by Proposition 2.3, to left-multiplying by an invertible (Corollary 2.1) matrix E.

The result of doing \(r\) to the augmented matrix \((A | \mathbf{b})\) is \(E (A | \mathbf{b})\), which equals \((EA | E\mathbf{b})\) because matrix multiplication works columnwise. Therefore \(A'=EA\) and \(\mathbf{b}' = E\mathbf{b}\).

Let \(\mathbf{v}\) be a solution of \(A\mathbf{x}=\mathbf{b}\). Then \(A \mathbf{v} = \mathbf{b}\), so left-multiplying by E gives \(EA\mathbf{v} = E\mathbf{b}'\), which is \(A' \mathbf{v} = \mathbf{b}'\) and \(\mathbf{v}\) is a solution of \(A'\mathbf{x}=\mathbf{b}'\).

Let \(\mathbf{v}\) be a solution of \(A' \mathbf{x} = \mathbf{b}'\). We know E is invertible, so we can left-multiply \(A'\mathbf{v} = \mathbf{b}'\) by the inverse of E to get \(E^{-1}A'\mathbf{v} = E^{-1}\mathbf{b}'\). Since \(E^{-1}A'=A\) and \(E^{-1}\mathbf{b}'=\mathbf{b}\) this says \(A\mathbf{v}=\mathbf{b}\), and \(\mathbf{v}\) is a solution of \(A \mathbf{x}=\mathbf{b}\).

2.3.1 Row reduced echelon form

Because doing row operations to the augmented matrix \((A|\mathbf{b})\) doesn’t change the solutions of the matrix equation \(A\mathbf{x}=\mathbf{b}\) (Proposition 2.4), we can solve systems of linear equations by doing row ops until the equations are so simple that the solutions can be easily read off. The ‘simple’ form we are looking for is called row reduced echelon form.

Definition 2.11

A zero row of a matrix is one in which every entry is 0.
The leading entry in a row of a matrix which isn’t all zeroes is the first nonzero entry starting from the left.
A matrix is in row reduced echelon form, or RRE form or RREF for short, if:
- all leading entries are 1,
- any leading entry is strictly to the right of any leading entry in the row above it,
- if a column contains a leading entry, every other entry in that column is 0, and
- any zero rows are below any non-zero rows.

Example 2.3

\(\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}\) isn’t in RRE form: the zero row is at the top.
\(\begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}\) isn’t in RRE form: there is a row in which the left-most non-zero entry is not 1.
\(\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}\) isn’t in RRE form: the left-most 1 in row 2 is not to the right of the left-most 1 in the row above it.
\(\begin{pmatrix} 1 &\alpha &\beta & 3 \\ 0 &0 & 1 & -2 \end{pmatrix}\) is not in RRE unless \(\beta = 0\): the left-most 1 in row 2 is in column 3, but it is not the only non-zero entry in column 3 unless \(\beta = 0\).
\(\begin{pmatrix} 1 & 0 & 0 & 3 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}\) is in RRE form.

If \((A|\mathbf{b})\) is the augmented matrix of a system of linear equations and A is in RRE form, we can easily read off the solutions (if any) of the system. For example, if we think of the fourth example above with \(\alpha=1, \beta=0\) as the augmented matrix of a system of linear equations, those equations are

\[\begin{align*} x_1 + x_2&= 3 \\ x_3 &= -2 \end{align*}\]

and we can see that the general solution is \(x_3=-2\), \(x_2\) can be anything, and \(x_1 = 3-x_2\). Similarly if we had a system whose augmented matrix was \[ \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 5 \\ 0 & 0 & 2 \end{pmatrix} \] then we can see that the system has no solutions, since the last equation says \(0 = 2\) which is impossible.

Proposition 2.5 Any matrix can be put into RRE form by performing a sequence of row operations.

Proof. The proof is by induction on the number of columns. It is easy to get a one-column matrix into RRE form: if all entries are zero then the matrix is in echelon form already, if not, swap the zero rows so that they are at the bottom then multiply the non-zero rows by the reciprocal of their entries, then subtract the top row from all of the non-zero rows below it. The matrix then looks like \(\begin{pmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}\) which is in RRE form.

Now suppose A is a matrix with \(n>1\) columns. By induction there is a sequence of row operations that puts the matrix formed by the first n-1 columns of A into RRE form. Let B be the result of doing those row operations to A, so that the first n-1 columns of B are RRE form, but B itself may not be in RRE form because of its final column.

Suppose that the RREF matrix formed by the first n-1 columns of B has k rows of zeros at the bottom. If each of these k rows has a 0 in its final column, B itself is already in RREF.

Otherwise there is at least one of these k rows with a nonzero entry in its final column. Swap it to the top of these k rows and subtract multiples of this row from the other rows of B to eliminate every other nonzero entry in the final column of B. The resulting matrix is in RRE form.

Here is the key result on RRE form and the solution of linear equations:

Proposition 2.6 Let \((A' | \mathbf{b}')\) be the result of putting \((A|\mathbf{b})\) into RRE form. Then a vector \(\mathbf{v}\) is a solution of \(A\mathbf{x}=\mathbf{b}\) if and only if it is a solution of \(A' \mathbf{x} = \mathbf{b}'\).

Proof. There is a sequence \(r_1,\ldots,r_N\) of row operations that takes \((A|\mathbf{b})\) to \((A|\mathbf{b}')\), so this follows by repeatedly using Proposition 2.4.

2.3.2 Solving systems in RRE form

Our method for solving linear systems whose matrix form is \(A\mathbf{x}=\mathbf{b}\) is then:

Form the augmented matrix \((A|\mathbf{b})\), and do row operations until it is in echelon form \((A'| \mathbf{b}')\). This new system has the same solutions as the old system, by Proposition 2.4.
Read off the solutions, if there are any.

Example 2.4

Suppose the augmented matrix is \[ \begin{pmatrix} 1 & 0 & 2 & 2 \\ 0 & 1 & 3 & -1 \end{pmatrix}.\] This corresponds to the system \[\begin{align*} x+2z&= 2 \\ y+3z & =-1. \end{align*}\] z can be chosen freely, and then the solution is completely determined: \(x=2-2z\) and \(y=-1-3z\). We can write these solutions in vector form as \[\begin{pmatrix}x\\y\\z\end{pmatrix}= \begin{pmatrix} 2 \\ -1 \\ 0 \end{pmatrix} + z \begin{pmatrix} -2 \\ -3 \\ 1 \end{pmatrix}.\]
Suppose the augmented matrix is \[ \begin{pmatrix} 1&0&0&a \\ 0&1&0&b \\ 0&0&1&c \\ 0&0&0&d \end{pmatrix}. \] If \(d\neq 0\) then the final equation says \(0=d\) which is false, so this system has no solutions. On the other hand if \(d=0\) then the equations say \(x=a,y=b,z=c\) so there is a unique solution to this system.

We can obtain the solutions of a set of equations whose augmented matrix is in row reduced echelon form as follows. Recall that the final column corresponds to the constant term of the equations, and each column before that corresponds to a variable.

If the final column has a leading entry, there are no solutions.
Otherwise, variables which correspond to columns with no leading entry (if any) can be chosen freely, and the remaining variables are uniquely determined in terms of these.

The first bullet point is illustrated by what happens in the second example above when \(d\neq 0\). In the first example the z column has no leading entry, so we were able to choose z freely, therefore getting infinitely many solutions. In the second, each of the x, y, z columns had leading entries, so there was no free choice and the equations had only one solution.

Theorem 2.1 The RRE form of a matrix is unique. That is, if B and C are matrices in RRE form obtained by doing row ops to a matrix A, then B=C.

Proof. (not examinable, taken from Yuster, Mathematics Magazine 57 no.2 1984 pp.93-94). The proof is by induction on the number m of columns of A; for m=1 the only possible RREF for A is the zero vector if A is itself zero, or \(\begin{pmatrix}1 \\ 0 \\ \vdots \\ 0\end{pmatrix}\) otherwise.

Now suppose A has \(m>1\) columns and that B and C are different matrices in RRE form obtained by doing row operations to A. The first m-1 columns of B and C are RRE forms for the first m-1 columns of A, so by the inductive hypothesis B and C can only differ in their last columns. Suppose they differ in row j.

Let \(\mathbf{u}\) be any solution to \(B \mathbf{u}=\mathbf{0}\). By the previous proposition, \(B \mathbf{u} = \mathbf{0}\) iff \(A \mathbf{u} = \mathbf{0}\) iff \(C \mathbf{u} = \mathbf{0}\), so we have \((B-C)\mathbf{u}=\mathbf{0}\). As the first \(m-1\) columns of B and C are the same, this last equation is equivalent to \((b_{im}-c_{im})u_m = 0\) for all i. As \(b_{jm}\neq c_{jm}\) we must have \(u_m=0\).

This means the last columns of B and C have a leading 1. Otherwise the variable \(x_m\) corresponding to the last column could be chosen freely when solving \(B \mathbf{x}=\mathbf{0}\) and \(C \mathbf{x}= \mathbf{0}\). Since B and C are in RRE form, this last column has a leading 1 and all other entries are zero. But the first \(m-1\) columns of B and C determine where the leading 1 in column m can go if they are to be in RRE form. Since the first \(m-1\) columns of B and C are equal, it follows B and C are equal, a contradiction.

This last result means that we can talk about the row reduced echelon form of a matrix, because it tells us that given any matrix A there is one and only one matrix in RRE form that can be obtained from A by doing row operations.

2.3.3 Invertibility and RRE form

We want to prove that a matrix is invertible if and only if its RRE form is the identity, but we need two preliminary lemmas.

Lemma 2.3 A product of invertible matrices is invertible.

Proof. Let \(A_1,\ldots,A_m\) be invertible matrices of the same size. Then you can easily check that \[ A_m^{-1}\cdots A_1^{-1} \] is an inverse to \(A_1\cdots A_m\).

Lemma 2.4 Let A be an n✕n matrix in RRE form. Either \(A=I_n\) or A has a column with no leading entries.

Proof. Suppose every column of A contains a leading entry, so there are n leading entries. We must show that \(A=I_n\).

Since there’s at most one leading entry in a row, every row has a leading entry. Let \(c_i\) be the number of the column containing the leading entry of row i, so that the numbers \(c_1,\ldots, c_n\) are the numbers 1, 2, .., n in some order. Because A is in RREF, \(c_1<c_2<\cdots <c_n\), and therefore \(c_i=i\) for all i.

Columns of a RREF matrix which contain a leading entry have all entries zero apart from the leading entry, which is 1. Thus the ith column of A, which contains a leading entry in row i as \(c_i=i\), is \((0, 0, ..., 0, 1, 0, ..., 0)^T\) where the 1 is in position i. It follows \(A=I_n\).

Theorem 2.2 A matrix is invertible if and only if its RRE form is the identity matrix.

Proof. Suppose the RRE form of A is \(I_n\), so that there are row ops \(r_1,\ldots,r_m\) with \[ r_1(r_2(\cdots (r_m (A)))\cdots ) = I_n.\] By Proposition 2.3, \[ r_1(I_n)r_2(I_n)\cdots r_m(I_n) A = I_n.\] By Corollary 2.1, the \(r_i(I_n)\) are invertible, and multiplying the above equation on the left by the inverse of \(r_1(I_n)\), then the inverse of \(r_2(I_n)\), and so on, \[ A = r_m(I_n)^{-1}r_{m-1}(I_n)^{-1} \cdots r_1(I_n)^{-1}.\] It follows that A is a product of invertible matrices, so by Lemma 2.3 A is invertible.

Conversely suppose the RRE form A’ of A is not \(I_n\), so that by Lemma 2.4 this RRE form has a column with no leading entry, say column j. The matrix equation \(A'\mathbf{x}=\mathbf{0}\) has infinitely many solutions because we can choose the value of the variable corresponding to column j freely, and so the matrix \(A\mathbf{x} = \mathbf{0}\) as inifnitely many solutions too (Proposition 2.6). But if A is invertible, \(A\mathbf{x}=\mathbf{0}\) implies \(\mathbf{x}=A^{-1}\mathbf{0}= \mathbf{0}\) so there is only one solution, a contradiction.

Corollary 2.2 A square matrix A is invertible if and only if the only solution to \(A\mathbf{x}=\mathbf{0}\) is \(\mathbf{x}=\mathbf{0}\).

Proof. If A is invertible and \(A\mathbf{x}=\mathbf{0}\) then multiplying on the left by \(A^{-1}\) we get \(\mathbf{x}=A^{-1}\mathbf{0}=\mathbf{0}\). Conversely if A is not invertible then its RRE form is not the identity. When we solve \(A\mathbf{x}=\mathbf{0}\) by putting the augmented matrix \((A\, \mathbf{0})\) into RRE form we get \((B\, \mathbf{0})\) where B is the RRE form of A, which has a column with no leading entry since B is not the identity, and we can choose the value of the variable corresponding to that column freely. This means there are infinitely many solutions of \(A\mathbf{x}=\mathbf{0}\).

If A is invertible and \(\mathbf{c}_j\) is the jth column of \(A^{-1}\), the fact that

\[ A A^{-1}=I_n\]

and the fact that matrix multiplication works column-by-column means that \(A \mathbf{c}_j=\mathbf{e}_j\), where \(\mathbf{e}_j\) is the jth column of \(I_n\) (that is, the column vector with zeroes everywhere except for a 1 in row j). This means \(\mathbf{c}_j\) is the solution of the equation \(A\mathbf{x}=\mathbf{e}_j\). But we can solve this by putting \((A \, \mathbf{e}_j)\) into RRE form: we know that the RRE form of A is \(I_n\), so the result will be \((I_n \, \mathbf{c}_j)\).

Rather than do this n times to work out the n columns of \(A^{-1}\), we can simply start with the matrix \((A\,\,\, I_n)\) and put it into RRE form — the result will be \((I_n\,\,\, A^{-1})\).

Example 2.5 To compute the inverse of \(A=\begin{pmatrix} 1&2\\3&4 \end{pmatrix}\) we put \((A\,\,\, I_2)\) into RRE form: \[\begin{multline} \begin{pmatrix} 1&2 & 1 & 0 \\ 3&4 & 0 & 1 \end{pmatrix} \mapsto \begin{pmatrix} 1&2&1&0\\ 0&-2&-3&1 \end{pmatrix} \mapsto \begin{pmatrix} 1&2&1&0\\ 0&1&3/2 & -1/2 \end{pmatrix} \\ \mapsto \begin{pmatrix} 1&0&-2 & 1 \\ 0&1&3/2&-1/2 \end{pmatrix} \end{multline}\] This is in RRE form, so the inverse of A is \[ \begin{pmatrix} -2&1 \\ 3/2 & -1/2 \end{pmatrix}\] as you can check.

We can use this method as a test of the invertibility of A which produces at the same time the inverse of A, if it exists: start with \((A\,\,\, I_n)\) and put it into RRE form, getting \((C\,\,\, D)\) say. If \(C\neq I_n\) then A isn’t invertible, by Theorem 2.2. If \(C=I_n\), then A is invertible and the remaining part D of this matrix is \(A^{-1}\).