Kernel of a linear map – Serlo

The kernel of a linear map intuitively contains the information that is "deleted" when applying the linear map. Further, the kernel can be used to characterize the injectivity of linear maps. It also plays a central role in solving systems of linear equations.

Introduction

We have learned about special mappings between vector spaces, called linear maps. Those are structure-preserving; that is, they are compatible with addition and scalar multiplication of a vector space. We can therefore think of a linear map from $V$ to $W$ as something that transports the vector space structure from $V$ to $W$ .

Introductory examples

We consider two accounts, each with the account balance $x$ and $y$ respectively. We can describe this information with a vector $(x,y)^{T}\in \mathbb {R} ^{2}$ . The total account balance is the sum of the two account balances. We can calculate it by using the map

\mathbb {R} ^{2}\to \mathbb {R} ,\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto x+y

This map is linear and therefore transports the vector space structure from $\mathbb {R} ^{2}$ to $\mathbb {R}$ . In the process, information is lost: one no longer knows how the money is distributed among the accounts. For example, one can no longer distinguish the individual account balances $(500,0)^{T}$ and $(200,300)^{T}$ because they both map to the same total account balance $500+0=200+300=500$ . In particular, the mapping is not injective. However, we get the information about how much money is in the accounts in total.

Rotation of the real plane by 90° against the clockwise direction

Next, we consider the map

\mathbb {R} ^{2}\to \mathbb {R} ^{2},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\end{pmatrix}}.

Visually, this corresponds to a counterclockwise rotation of $\mathbb {R} ^{2}$ by $90$ degrees. By undoing this rotation, one can recover the original vector from any rotated vector in $\mathbb {R} ^{2}$ . Formally speaking, this mapping is an isomorphism and no information is lost. In particular, the image of linearly independent vectors is linearly independent again (because an isomorphism is injective, see the article monomorphism) and the image of a generator of $\mathbb {R} ^{2}$ is again a generator of $\mathbb {R} ^{2}$ (because an isomorphism is surjective, see the article epimorphism).

Finally, we consider a rotation again, but then embed the rotated plane into the $\mathbb {R} ^{3}$ :

\mathbb {R} ^{2}\to \mathbb {R} ^{3},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\\0\end{pmatrix}}.

Although this mapping is no longer bijective, no information is lost here when transporting the vector space structure of the $\mathbb {R} ^{2}$ into the $\mathbb {R} ^{3}$ : As in the previous example, different vectors in the $\mathbb {R} ^{2}$ are mapped to different vectors in the $\mathbb {R} ^{3}$ because of injectivity. Linear independence of vectors is also preserved. However, a generating system of $\mathbb {R} ^{2}$ is not mapped to a generator of $\mathbb {R} ^{3}$ . For example, the linear map sends the standard basis $\{(1,0)^{T},(0,1)^{T}\}$ to $\{(0,1,0)^{T},(-1,0,0)^{T}\}$ , which is not a generator of $\mathbb {R} ^{3}$ . The property of a set of vectors to be a generator depends on the ambient space. This is not the case with linear independence; it is an "intrinsic" property of sets of vectors.

Derivation

We have seen various examples of linear maps that transport a $K$ -vector space into another $K$ -vector space, while preserving the structure. In the process, varying amounts of "intrinsic" information from the original vector space (such as differences of vectors or linear independence) were lost. The last example suggests that injective mappings preserve such intrinsic properties. On the other hand, we see: If $f\colon V\to W$ is not injective, then there are vectors $v,v'\in V$ with $f(v)=f(v')$ . So in that case, $f$ "eliminates" the difference $v-v'$ of $v$ and $v'$ . The difference $v-v'$ is again an element in $V$ . Since $f$ is linear, we can reformulate:

f(v)=f(v')\iff 0=f(v)-f(v')=f(v-v').

Intuitively, $f$ is injective if and only if differences $v-v'$ of vectors under $f$ are not eliminated (i.e., mapped to zero). Because $f$ is structure-preserving, we have that for all $v,v'\in V$ and $\lambda \in K$ , that $f(v-v')=0$ implies

f(\lambda v-\lambda v')=f(\lambda (v-v'))=\lambda f(v-v')=\lambda \cdot 0=0.

If the difference of $v$ and $v'$ is eliminated under $f$ , so is that of $\lambda v$ and $\lambda v'$ . In the same way, if $v,v',w,w'\in V$ : if $f(v-v')=0$ and $f(w-w')=0$ , then also

f((v+w)-(v'+w'))=f((v-v')+(w-w'))=f(v-v')+f(w-w')=0+0=0.

So the difference of $v+w$ and $v'+w'$ is also eliminated. The differences eliminated by $f$ are themselves vectors in $V$ . These are send by $f$ to the zero element $0_{W}$ of $W$ and thus, the eliminated vectors are in the preimage $f^{-1}(\{0_{W}\})$ . Conversely, any vector $v\in f^{-1}(\{0_{W}\})$ can be written as a difference $v=v-0$ ; that is, the difference $v-0$ between $v$ and the zero vector is eliminated by $f$ . The preimage $f^{-1}(\{0_{W}\})$ measures exactly what differences of vectors (how much "information") is lost in the transport from $V$ to $W$ . Our considerations show that $f^{-1}(\{0_{W}\})$ is even a subspace of $V$ . We give a name to this subspace: the kernel of $f$ .

Definition

The kernel of a linear map intuitively measures how much "intrinsic" information about vectors from $V$ (differences of vectors or linear independence) is lost when applying the map. Mathematically, the kernel is the preimage of the zero vector.

Definition (Kernel of a linear map)

Let $V$ and $W$ be two $K$ -vector spaces and $f\colon V\rightarrow W$ linear. Then we call $\ker f:=f^{-1}(0_{W})=\lbrace v\in V\mid f(v)=0_{W}\rbrace$ the kernel of $f$ .

In the derivation we claimed that the kernel of a linear map from $V$ to $W$ is a subspace of $V$ . We will now prove this in detail.

Theorem (The kernel is a vector space)

Let $f\colon V\to W$ be a linear map between the $K$ -vector spaces $V$ and $W$ . Then $\ker f$ is a subspace of $V$ .

Proof (The kernel is a vector space)

To verify the claim, we need to show four things:

$\ker f\subseteq V$
$\ker f\neq \emptyset$
For all $v_{1},v_{2}\in \ker f$ we have that $v_{1}+v_{2}\in \ker f$ .
For all $v\in \ker f$ and all $\lambda \in K$ we have that $\lambda \cdot v\in \ker f$ .

Proof step: $\ker f\subseteq V$

The first assertion follows directly from the definition.

Proof step: $\ker f\neq \emptyset$

Since $f$ is linear, we know that $f(0_{V})=0_{W}$ holds. So $\ker f\neq \emptyset$ .

Proof step: For all $v_{1},v_{2}\in \ker f$ , we have that $v_{1}+v_{2}\in \ker f$ .

Now we show the third point: for all $v_{1},v_{2}\in \ker f\subseteq V$ it holds that

{\begin{aligned}f(v_{1}+v_{2})&=\ f(v_{1})+f(v_{2})\\[0.3em]&\ {\color {OliveGreen}\left\downarrow f{\text{ is linear (i.e., additive)}}\right.}\\[0.3em]&=\ 0_{W}+0_{W}\\[0.3em]&\ {\color {OliveGreen}\left\downarrow v_{1},v_{2}\in \ker f\right.}\\[0.3em]&=\ 0_{W}\end{aligned}}

So also $v_{1}+v_{2}$ is in the kernel of $f$ .

Proof step: For all $v\in \ker f$ and all $\lambda \in K$ we have that $\lambda \cdot v\in \ker f$ .

The fourth step works analogously to the third step: For all $v\in \ker f$ and all $\lambda \in K$ it is true that

{\begin{aligned}f(\lambda \cdot v)&=\lambda \cdot f(v)\\[0.3em]&\ {\color {OliveGreen}\left\downarrow f{\text{ is linear (i.e., homogeneous)}}\right.}\\[0.3em]&=\ \lambda \cdot 0_{W}\\[0.3em]&\ {\color {OliveGreen}\left\downarrow v\in \ker f\right.}\\[0.3em]&=\ 0_{W}\end{aligned}}

Thus, $\lambda \cdot v\in \ker f$ .

Examples

We determine the kernel of the examples from the introduction.

Vector is mapped to the sum of entries

We consider the mapping

f\colon \mathbb {R} ^{2}\to \mathbb {R} ,\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto x+y.

The kernel of $f$ is made up by the vectors $(x,y)^{T}\in \mathbb {R} ^{2}$ with $0=f((x,y)^{T})=x+y$ , so $y=-x$ . In other words

\ker f=\left\{{\begin{pmatrix}x\\-x\end{pmatrix}}\mid x\in \mathbb {R} \right\}=\operatorname {span} \left\{{\begin{pmatrix}1\\-1\end{pmatrix}}\right\}.

Thus the kernel of $f$ is a one-dimensional subspace of $\mathbb {R} ^{2}$ . More generally, for $n\in \mathbb {N}$ we can consider the mapping

g\colon \mathbb {R} ^{n}\to \mathbb {R} ,\quad {\begin{pmatrix}x_{1}\\\vdots \\x_{n}\end{pmatrix}}\mapsto x_{1}+\cdots +x_{n}

Again, by definition, a vector $(x_{1},\cdots ,x_{n})^{T}\cdots ,x_{n}$ lies in the kernel of $g$ if and only if $0=g((x_{1},\cdots ,x_{n}))=x_{1}+\cdots +x_{n}$ holds. So we can freely choose $x_{1},\ldots ,x_{n-1}\in \mathbb {R}$ and then set $x_{n}=-x_{1}-\cdots -x_{n-1}$ . Thus

\ker g=\left\{{\begin{pmatrix}x_{1}\\\vdots \\x_{n-1}\\-x_{1}-\cdots -x_{n-1}\end{pmatrix}}\mid x_{1},\ldots ,x_{n-1}\in \mathbb {R} \right\}=\operatorname {span} \left\{{\begin{pmatrix}1\\0\\\vdots \\0\\-1\end{pmatrix}},{\begin{pmatrix}0\\1\\\vdots \\0\\-1\end{pmatrix}},\ldots ,{\begin{pmatrix}0\\0\\\vdots \\1\\-1\end{pmatrix}}\right\}.

Hence, the kernel of $g$ is a $(n-1)$ -dimensional subspace of $\mathbb {R} ^{n}$ . It is also called a hyperplane in $\mathbb {R} ^{n}$ .

Rotation in $\mathbb {R} ^{2}$

We consider the rotation

f\colon \mathbb {R} ^{2}\to \mathbb {R} ^{2},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\end{pmatrix}}.

Suppose $(x,y)^{T}$ lies in the kernel of $f$ , i.e. it holds that

{\begin{pmatrix}0\\0\end{pmatrix}}=f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}-y\\x\end{pmatrix}}.

From this we obtain $x=y=0$ . So only the zero vector lies in the kernel of $f$ and we have that $\ker f=\{(0,0)^{T}\}$ .

$\mathbb {R} ^{2}$ is rotated and embedded into the $\mathbb {R} ^{3}$

Next we consider

f\colon \mathbb {R} ^{2}\to \mathbb {R} ^{3},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}-y\\x\\0\end{pmatrix}}.

As in the previous example, we determine the kernel by choosing any vector $(x,y)^{T}\in \ker f$ . Thus it holds that

{\begin{pmatrix}0\\0\\0\end{pmatrix}}=f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}-y\\x\\0\end{pmatrix}}.

Again it follows that $x=y=0$ , so that also for this mapping $\ker f=\{(0,0)^{T}\}$ holds.

Derivatives of polynomials

Finally, we consider a linear map that did not appear in the introduction:

f\colon \mathbb {R} [X]\to \mathbb {R} [X],\quad p\mapsto p',

which maps a real polynomial to its derivative. That is, a polynomial

p=a_{0}+a_{1}X+a_{2}X^{2}+\cdots +a_{n}X^{n}

with coefficients $a_{0},\ldots ,a_{n}\in \mathbb {R}$ is mapped to the polynomial

p'=a_{1}+2a_{2}X+\cdots +na_{n}X^{n-1}

Graphically, we associate with $p$ a polynomial $p'$ that indicates the gradient of $p$ at each point. From this information, we still learn what the shape of the polynomial is (just as if we were given a stencil). However, we no longer know where it is positioned on the $y$ -axis, because the information about the constant part of the polynomial is lost when taking the derivative. Polynomials that just differ by a displacement along the $y$ -axis can no longer be distinguished after derivation. For example, both $p=x^{2}-x+1$ and $q=x^{2}-x+42$ have the derivative $p'=q'=2x-1$ . So the mapping $f$ maps them to the same polynomial.

The kernel of $f$ thus contains exactly the constant polynomials:

\ker f=\{p\in \mathbb {R} [X]\mid p=c{\text{ for some }}c\in \mathbb {R} \}

The inclusion " $\supseteq$ " is clear, because the derivative of a constant polynomial is always the zero polynomial. For the converse inclusion " $\subseteq$ ", we consider any polynomial $p\in \ker f$ and show that it is constant. We can always write such a polynomial as $p=\sum _{i=1}^{n}a_{i}X^{i}$ for some $n\in \mathbb {N}$ and certain coefficients $a_{0},\ldots ,a_{n}\in \mathbb {R}$ . Because of $p\in \ker f$ it holds that

0=f(p)=p'=\sum _{i=1}^{n}a_{i}X^{i-1}

and by comparison of the coefficients, we obtain $a_{1}=a_{2}=\ldots =a_{n}=0$ . So $p$ is constant.

To-Do:

Once the polynomial ring article is written, link to the coefficient comparison in it

Kernel and injectivity

In the derivation above, we saw that a linear map preserves all differences of vectors (i.e., no vector is eliminated) if and only if the kernel consists only of the zero vector. We also saw there that linearity implies: A linear map is injective if and only if no difference of vectors is eliminated. So we have the following theorem:

Theorem (Relationship between kernel and injectivity)

Let $V$ and $W$ be two $K$ -vector spaces and let $f\colon V\to W$ be linear. Then $f$ is injective if and only if $\colon f=\lbrace 0_{V}\rbrace$ . In particular, $f$ is injective if and only if $\dim(\ker f)=0$ .

Summary of proof (Relationship between kernel and injectivity)

For establishing the theorem we have to show two directions:

If $f$ is injective, then $\ker f=\lbrace 0_{V}\rbrace$ .
From $\ker f=\lbrace 0_{V}\rbrace$ it follows that $f$ is injective.

The first direction we directly be shown. For the other direction, we assume $\ker f=\lbrace 0_{V}\rbrace$ and show that for any $v_{1}$ and $v_{2}\in V$ with $f(v_{1})=f(v_{2})$ we must have $v_{1}=v_{2}$ . Here, we can use that for two vectors $v_{1},v_{2}\in V$ with $f(v_{1})=f(v_{2})$ , we have $f(v_{1})-f(v_{2})=0$ . Further, $v_{1}=v_{2}$ is equivalent to $v_{1}-v_{2}=0$ .

Proof (Relationship between kernel and injectivity)

Proof step: If $f$ is injective, then $\ker f=\lbrace 0_{V}\rbrace$ .

Let us first assume that $f$ is injective. We already know that $f(0_{V})=0_{W}$ . Since $f$ is injective, it can map at most one argument to one function value. So only $0_{V}$ is mapped to $0_{W}$ . Thus $\ker f=\lbrace 0_{V}\rbrace$ , because the kernel is defined as the set of all vectors that meet the zero vector.

Proof step: From $\ker f=\lbrace 0_{V}\rbrace$ we get that $f$ is injective.

Let $\ker f=0_{V}$ . In order to show that $f$ is injective, we consider two vectors $v_{1}$ and $v_{2}$ from $V$ with $f(v_{1})=f(v_{2})$ . Then

{\begin{aligned}f(v_{1}-v_{2})&=\\&{\color {OliveGreen}\left\downarrow f{\text{ is linear}}\right.}\\[0.3em]&=f(v_{1})-f(v_{2})\\&{\color {OliveGreen}\left\downarrow f(v_{2})=f(v_{1})\right.}\\[0.3em]&=\ 0_{W}\\\end{aligned}}

So $v_{1}-v_{2}is\in \ker f$ . Since we have assumed $\ker f=0_{V}$ , it follows that $v_{1}-v_{2}=0_{V}$ and thus $v_{1}=v_{2}$ . Hence, we have the implication $f(v_{1})=f(v_{2})\implies v_{1}=v_{2}$ for all $v_{1},v_{2}\in V$ . But this is exactly the definition for $f$ being injective.

Proof step: $f$ is injective if and only if $\dim(\ker f)=0$ .

We have already shown that $f$ is injective if and only if $\ker f=\lbrace 0_{V}\rbrace$ . It remains to show that this is equivalent to $\dim(\ker f)=0$ . The kernel of $f$ is a subspace of $V$ . A subspace of $V$ is exactly equal to $\lbrace 0_{V}\rbrace$ if its dimension is zero. So $f$ is indeed injective if and only if $\dim \ker f=0$ .

Alternative proof (Relationship between kernel and injectivity)

One can also show this theorem with only one chain of equivalent statements:

{\begin{aligned}f{\text{ is injective}}&\iff \forall v_{1},v_{2}\in V:\left(v_{1}\neq v_{2}\implies f(v_{1})\neq f(v_{2})\right)\\[0.3em]&\iff \forall v_{1},v_{2}\in V:\left(v_{1}-v_{2}\neq 0_{V}\implies f(v_{1})-f(v_{2})\neq 0_{W}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow \ f{\text{ is linear}}\right.}\\[0.3em]&\iff \forall v_{1},v_{2}\in V:\left(v_{1}-v_{2}\neq 0_{V}\implies f(v_{1}-v_{2})\neq 0_{W}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow \ {\text{set }}{\tilde {v}}=v_{1}-v_{2}\right.}\\[0.3em]&\iff \forall {\tilde {v}}\in V:\left({\tilde {v}}\neq 0_{V}\implies f({\tilde {v}})\neq 0_{W}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow \ f(0_{V})=0_{W}\right.}\\[0.3em]&\iff {\text{Only }}0_{V}{\text{ is mapped to }}0_{W}\\[0.3em]&\iff \ker f=\{0_{V}\}.\end{aligned}}

The larger the kernel is, the more differences between vectors are "eliminated" and the more the mapping "fails to be injective". The kernel is thus a measure of the "non-injectivity" of a linear map.

Injective maps and subspaces

In the introductory examples we conjectured that injective linear maps preserve "intrinsic" properties of vector spaces. By this, we mean properties that do not depend on the ambient vector space, such as the linear independence of vectors or vectors being distinct. The property of being a generator can be lost in injective linear maps, as we have seen in the example of the twisted embedding of $\mathbb {R} ^{2}$ into $\mathbb {R} ^{3}$ : The mapping is injective, but the standard basis of $\mathbb {R} ^{2}$ is not mapped to a generator of $\mathbb {R} ^{3}$ .

What exactly does it mean that a property of a family $N=(v_{i})_{i\in I}\subseteq V$ of vectors does not depend on the ambient space $V$ ? Often, properties of vectors from $V$ (for example, linear independence) depend on the vector space structure of $V$ , that is, addition and scalar multiplication. To make dependences as small as possible, we restrict our attention to the smallest subspace of $V$ containing $N$ , that is, we restrict to $\operatorname {span} (N)$ . Now, we call a property of $N$ intrinsic if it depends only on $\operatorname {span} (N)$ but not on $V$ .

Example (Intrinsic and non-intrinsic properties)

Let $V$ be a vector space and $N\subseteq V$ a subset of vectors.

Linear independence of vectors in $N$ is an intrinsic property, because the definition of linear independence can also be checked in $\operatorname {span} (N)$ and does not refer to the ambient vector space $V$ .
Differences of vectors in $N$ are also intrinsic properties: all that is needed to examine it are vectors $v,v'\in N$ and their difference $v-v'\in \operatorname {span} (N)$ .
Not intrinsic, on the other hand, is the property of $N$ of being a generator of $V$ : The set $N$ is always a generator of $\operatorname {span} (N)$ . But if the ambient space $V$ is larger than $\operatorname {span} (N)$ , then $N$ is not a generator of $V$ .

What do intrinsic properties of a family of vectors have to do with injectivity? Let $f\colon V\to W$ be a linear map. Suppose $f$ preserves intrinsic properties of vectors, that is, if a family $N=(v_{i})_{i\in I}\subseteq V$ has some intrinsic property, then its image $f(N)=(f(v_{i}))_{i\in I}$ under $f$ also has this property. Then $f$ also preserves the property of vectors being different, since this is an intrinsic property. That means, if $v,v'\in V$ are different, i.e., $v\neq v'$ , then their image under $f$ is also different, i.e., $f(v)\neq f(v')$ . So $f$ is injective.

Conversely, if $f$ is injective, then $V$ is isomorphic to the subspace $f(V)$ of $W$ : If we restrict the target space of $f$ to its image, we obtain an injective and surjective linear map $f\colon V\to f(V)$ , that is, an isomorphism. In particular, for any family $N$ in $V$ , it holds that the subspace $\operatorname {span} (N)$ of $V$ is isomorphic to $f(\operatorname {span} (N))$ . Thus, the latter has the same properties as $\operatorname {span} (N)$ and hence, $f$ preserves intrinsic properties of subsets of $V$ .

So we have seen that $f\colon V\to W$ is injective if and only if $f$ preserves intrinsic properties of subsets of $V$ .

Kernel and linear independence

In the previous section we have seen that injective linear maps $V\to W$ are exactly those linear maps which preserve intrinsic properties of $V$ . The linear independence of a family of vectors is such an intrinsic property, as they either hold for any choice of an ambient space or do not hold for any choice of an ambient space.

So, injective linear maps should preserve linear independence of vectors, i.e., the image of linearly independent vectors is again linearly independent. Conversely, a linear map cannot be injective if it does not preserve the linear independence of vectors, since the intrinsic information of "being linearly independent" is lost.

Overall, we get the following theorem, which has already been proved in the article on monomorphisms:

Theorem (Injective linear maps preserve linear independence)

Let $V$ and $W$ be two $K$ -vector spaces and $f\colon V\to W$ a linear map. Then $\ker(f)=\{0\}$ holds if and only if the image of every linearly independent subset of $V$ is again linearly independent.

In particular, for any linear map $f\colon V\to W$ , the vector space $f(V)$ is a $\dim(V)$ -dimensional subspace of $W$ . In the finite-dimensional case, there cannot exist an injective linear map from $V$ to $W$ if $\dim(W)<\dim(V)$ . This has also already been shown in the article on monomorphisms.

Kernel and linear systems

The kernel of a linear map is an important concept in the study of systems of linear equations.

Let $K$ be a field and let $m,n\in \mathbb {N}$ . We consider a linear system of equations

{\begin{aligned}a_{11}x_{1}+a_{12}x_{2}+\cdots +a_{1n}x_{n}&=b_{1}\\a_{21}x_{1}+a_{22}x_{2}+\cdots +a_{2n}x_{n}&=b_{2}\\&\vdots \\a_{m1}x_{1}+a_{m2}x_{2}+\cdots +a_{mn}x_{n}&=b_{m}\end{aligned}}

with $n$ variables $x_{1},\ldots ,x_{n}$ and $m$ rows. We have $a_{ij},b_{i}\in K$ , where $i\in \{1,\ldots ,m\}$ and $j\in \{1,\ldots ,n\}$ . We can also write this system of equations using matrix multiplication:

\underbrace {\begin{pmatrix}a_{11}&\cdots &a_{1n}\\\vdots &&\vdots \\a_{m1}&\cdots &a_{mn}\end{pmatrix}} _{A}\underbrace {\begin{pmatrix}x_{1}\\\vdots \\x_{n}\end{pmatrix}} _{x}=\underbrace {\begin{pmatrix}b_{1}\\\vdots \\b_{m}\end{pmatrix}} _{b},

where $A\in K^{m\times n}$ , $x\in K^{n}$ and $b\in K^{m}$ . We denote the set of solutions by

L(A,b)=\{x\in K^{n}\mid Ax=b\}.

Determining a solution to the linear system of equations $Ax=b$ for a given right-hand side $b$ is the same as finding a preimage of $b$ under the linear map

f_{A}\colon K^{n}\to K^{m},\quad x\mapsto Ax

To-Do:

Link where the map "multiply matrices by a given fixed matrix" is studied? Especially where it is explained that it is linear. Possibly also to the article where it is explained how to determine the kernel of a matrix (Gauss), if this is written.

The system of equations $Ax=b$ has solutions if the preimage $f_{A}^{-1}(b)$ is not empty. In this case, we may ask whether there are multiple solutions, that is, whether the solution is not unique. In other words, we are interested in how many preimages a $b$ has under $f_{A}$ .

By definition of injectivity, every point $b\in K^{m}$ has at most one element in its preimage if and only if $f_{A}$ is injective. This means that the linear system of equations $Ax=b$ has at most one solution for each $b\in K^{m}$ , that is, $|L(A,b)|\leq 1$ . Because $f_{A}$ is linear, injectivity is equivalent to $\ker(f_{A})=\{0\}$ . So we can already state:

Theorem (Uniqueness of solutions)

Let $K$ be a field and let $m,n\in \mathbb {N}$ , $A\in K^{m\times n}$ and $b\in K^{m}$ . Then

|L(A,b)|\leq 1{\text{ for all }}b\in K^{m}\iff \ker(f_{A})=\{0\}.

Hint

The set of solutions of $Ax=b$ can be empty. This occurs, for example, when $A=0$ is the zero matrix and $b\neq 0$ . Consequently, the kernel makes no statement about the existence of solutions, only about their uniqueness. To say something about the existence of solutions, we need to consider the image of $A$ .

Even if $f_{A}$ is not injective, i.e., $\ker(f_{A})\neq \{0\}$ holds, we can still say more about the set of solutions by exploiting the kernel: The difference of two vectors $x$ and $x'$ , which $f_{A}$ maps to the same vector, lies in the kernel of $f_{A}$ . Therefore, the preimage of some $b\in K^{m}$ under $f_{A}$ can be written as

f_{A}^{-1}(b)={\hat {x}}+\ker(f_{A})

where ${\hat {x}}$ is any element of $f_{A}^{-1}(b)$ . This is shown by the following theorem:

Theorem (Solution set of linear system and kernel)

Let $K$ be a field and let $m,n\in \mathbb {N}$ , $A\in K^{m\times n}$ and $b\in K^{m}$ . further, let ${\hat {x}}\in K^{n}$ be a solution of the linear system of equations $Ax=b$ . Then

L(A,b)={\hat {x}}+\ker(f_{A})=\{{\hat {x}}+y\mid y\in \ker(f_{A})\}.

In particular, a solution ${\hat {x}}$ of the system of equations is unique if and only if the linear map $f_{A}$ induced by $A$ has a kernel that only consists of the zero vector.

Proof (Solution set of linear system and kernel)

We have to prove the equality $L(A,b)=\{{\hat {x}}+y\,|\,y\in \ker(f_{A})\}$ . For this we need to establish two subset relations.

Proof step: $L(A,b)\subseteq \{{\hat {x}}+y\,|\,y\in \ker(f_{A})\}$

Let $x'\in L(A,b)$ . Then $Ax'=b=A{\hat {x}}$ . The only possible candidate for $y$ to satisfy the equation $x'={\hat {x}}+y$ is $y=x'-{\hat {x}}$ . Since

Ay=A(x'-{\hat {x}})=Ax'-A{\hat {x}}=b-b=0

we have $y\in \ker(f_{A})$ .

Proof step: $L(A,b)\supseteq \{{\hat {x}}+y\,|\,y\in \ker(f_{A})\}$

We show that ${\hat {x}}+y\in L(A,b)$ holds for any $y\in \ker(f_{A})$ . Let $y\in \ker(f_{A})$ be arbitrary. Then $Ay=0$ holds. Since by assumption, ${\hat {x}}$ is a solution of $Ax=b$ , we have that

A({\hat {x}}+y)=A{\hat {x}}+Ay=b+0=b.

So ${\hat {x}}+y$ is also a solution of $Ax=b$ and thus lies in the set $L(A,b)$ .

We have thus even extended the statement of the theorem above. The larger the kernel of $f_{A}$ is, that is, the "less injective" the mapping $x\mapsto Ax$ is, the "less unique" are solutions of $Ax=b$ , if any exist. The set of solutions of a linear system of equations $Ax=b$ is the kernel of the induced linear map $f_{A}$ shifted by a particular solution ${\hat {x}}$ . Furthermore,

\ker(f_{A})=\{x\in K^{n}\,\mid \,Ax=0\}=L(A,0).

The set of solutions of the homogeneous system of equations $Ax=0$ (that is, with right-hand side zero) is exactly the kernel of $f_{A}$ .

Hint

As with the previous theorem, no statement is made about whether solutions of $Ax=b$ exist at all for a given $b$ . The kernel only characterizes uniqueness.

Exercises

Exercise (Injectivity and dimension of $V$ and $W$ )

Let $V$ and $W$ be two finite-dimensional vector spaces. Show that there exists an injective linear map $f\colon V\to W$ if and only if $\dim(V)\leq \dim(W)$ .

How to get to the proof? (Injectivity and dimension of $V$ and $W$ )

To prove equivalence, we need to show two implications. For the execution, we use that every monomorphism $f\colon V\to W$ preserves linear independence: If $\{b_{1},\ldots ,b_{n}\}\subseteq V$ is a basis of $V$ , then the $n$ vectors $f(b_{1}),\ldots ,f(b_{n})\in W$ are linearly independent. For the converse direction, we need to construct a monomorphism from $V$ to $W$ using the assumption $\dim V\leq \dim W$ . To do this, we choose bases in $V$ and $W$ and then use the principle of linear continuation to define a monomorphism by the images of the basis vectors.

Solution (Injectivity and dimension of $V$ and $W$ )

Proof step: There is a monomorphism $\implies \dim(V)\leq \dim(W)$

Let $f:V\to W$ be a monomorphism and $\{v_{1},...,v_{n}\}$ a basis of $V$ . Then $\{v_{1},...,v_{n}\}$ is in particular linearly independent and therefore $\{f(v_{1}),...,f(v_{n})\}$ is linearly independent. Thus, it follows that $\dim(W)\geq n=\dim(V)$ . So $\dim(W)\geq \dim(V)$ is a necessary criterion for the existence of a monomorphism from $V$ to $W$ .

Proof step: $\dim(V)\leq \dim(W)\implies$ there is a monomorphism

Conversely, in the case $\dim(V)\leq \dim(W)$ we can construct a monomorphism: Let $\{v_{1},\dots ,v_{n}\}$ be a basis of $V$ and $\{w_{1},\dots ,w_{m}\}$ be a basis of $W$ . Then $n=\dim(V)\leq \dim(W)=m$ . We define a linear map $f\colon V\to W$ by setting

f(v_{i})=w_{i}

for all $i=1,\ldots ,n$ . According to the principle of linear continuation, such a linear map exists and is uniquely determined. We now show that $f$ is injective by proving that $\ker(f)=\{0_{V}\}$ holds. Let $x\in \ker(f)$ . Because $\{v_{1},\dots ,v_{n}\}$ is a basis of $V$ , there exist some $\lambda _{1},\ldots ,\lambda _{n}\in K$ with

x=\sum _{i=1}^{n}\lambda _{i}v_{i}.

Thus, we get

{\begin{aligned}0_{V}=f(x)&=f\left(\sum _{i=1}^{n}\lambda _{i}v_{i}\right)\\[0.3em]&\ {\color {OliveGreen}\left\downarrow \ f{\text{ is linear}}\right.}\\[0.3em]&=\sum _{i=1}^{n}\lambda _{i}f(v_{i})\\[0.3em]&\ {\color {OliveGreen}\left\downarrow \ f(v_{i})=w_{i}\right.}\\[0.3em]&=\sum _{i=1}^{n}\lambda _{i}w_{i}\\[0.3em]&\ {\color {OliveGreen}\left\downarrow \ \lambda _{i}=0{\text{ for }}i>n\right.}\\[0.3em]&=\sum _{i=1}^{m}\lambda _{i}w_{i}\end{aligned}}

Since $\{w_{1},\dots ,w_{m}\}$ are linearly independent, $\lambda _{i}=0_{K}$ must hold for all $i=1,\ldots ,n$ . So it follows for $x$ that

x=\sum _{i=1}^{n}\lambda _{i}v_{i}=\sum _{i=1}^{n}0_{K}\cdot v_{i}=0_{V}.

We have shown that $\ker(f)=\{0_{V}\}$ holds and thus $f$ is a monomorphism.

Exercise

We consider the linear map $f\colon \mathbb {R} ^{2}\to \mathbb {R} ^{2},\ (x,y)^{T}\mapsto (-3(x-y),x-y)^{T}$ . Determine the kernel of $f$ .

Solution

We are looking for vectors $(x,y)^{T}\in \mathbb {R} ^{2}$ such that $f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}0\\0\end{pmatrix}}$ . Let $(x,y)^{T}$ be any vector in $\mathbb {R} ^{2}$ for which $f\left({\begin{pmatrix}x\\y\end{pmatrix}}\right)={\begin{pmatrix}0\\0\end{pmatrix}}$ is true. We now examine what properties this vector must have. It holds that

{\begin{pmatrix}0\\0\end{pmatrix}}=f{\begin{pmatrix}x\\y\end{pmatrix}}={\begin{pmatrix}-3(x-y)\\x-y\end{pmatrix}}

So $-3(x-y)=0$ and $x-y=0$ . From this we conclude $x=y$ . So any vector $(x,y)^{T}$ in the kernel of $f$ satisfies the condition $x=y$ . Now take a vector $(x,x)^{T}$ with $x\in \mathbb {R}$ . Then

f{\begin{pmatrix}x\\x\end{pmatrix}}={\begin{pmatrix}-3(x-x)\\x-x\end{pmatrix}}={\begin{pmatrix}0\\0\end{pmatrix}}

We see that $(x,x)^{T}\in \ker(f)$ . In total

\ker(f)=\left\{{\begin{pmatrix}x\\x\end{pmatrix}}|x\in \mathbb {R} \right\}

Check your understanding: Can you visualize $\ker(f)$ in the plane? What does the image of $f$ look like? How do the kernel and the image relate to each other?

The kernel of f

We have already seen that

\ker(f)=\left\{{\begin{pmatrix}x\\x\end{pmatrix}}\mid x\in \mathbb {R} \right\}=\operatorname {span} \left({\begin{pmatrix}1\\1\end{pmatrix}}\right)

Now we determine the image of $f$ by applying $f$ to the canonical basis.

{\begin{aligned}f{\begin{pmatrix}1\\0\end{pmatrix}}={\begin{pmatrix}-3\\1\end{pmatrix}}\\f{\begin{pmatrix}0\\1\end{pmatrix}}={\begin{pmatrix}3\\-1\end{pmatrix}}\end{aligned}}

So $\operatorname {im} (f)=\operatorname {span} (f((1,0)^{T}),f((0,1)^{T}))$ holds. We see that the two vectors are linearly dependent. That is, we can generate the image with only one vector: $\operatorname {im} (f)=\operatorname {span} ((-3,1)^{T})$ .

The image of f
Image and kernel of f together

In our example, the image and the kernel of the linear map $f$ are straight lines through the origin. The two straight lines intersect only at the zero and together span the whole $\mathbb {R} ^{2}$ .

Exercise

Let $V$ be a vector space, $V\neq \{0\}$ , and $f\colon V\to V$ be a nilpotent linear map, i.e., there is some $n\in \mathbb {N}$ such that

f^{n}=\underbrace {f\circ \cdots \circ f} _{n{\text{ times}}}=0

is the zero mapping. Show that $\ker(f)\neq \{0\}$ holds.

Does the converse also hold, that is, is any linear map $f\colon V\to V$ with $\ker(f)\neq \{0\}$ nilpotent?

Solution

Proof step: $f$ nilpotent $\implies \ker(f)\neq \{0\}$

We prove the statement by contraposition. That is we show: If $\ker(f)=\{0\}$ , then $f$ is not nilpotent.

Let $\ker(f)=\{0\}$ . Then $f$ is injective, and as a concatenation of injective functions, $f\circ f$ is also injective. By induction it follows that for all $n\in \mathbb {N}$ the function $f^{n}=\underbrace {f\circ \cdots \circ f} _{n{\text{ times}}}$ is injective. But then also $\ker(f^{n})=\{0\}$ for all $n\in \mathbb {N}$ . Since the kernel of the zero mapping would be all of $V\neq \{0\}$ , the map $f^{n}$ could not be the zero mapping for any $n\in \mathbb {N}$ . Consequently, $f$ is not nilpotent.

Proof step: The converse implication

The converse implication does not hold. There are mappings that are neither injective nor nilpotent. For example we can define

f:\mathbb {R} ^{2}\to \mathbb {R} ^{2},\quad {\begin{pmatrix}x\\y\end{pmatrix}}\mapsto {\begin{pmatrix}x\\0\end{pmatrix}}

This mapping is not injective, because $(0,1)^{T}\in \ker(f)$ . But it is also not nilpotent, because we have $f^{n}((1,0)^{T})=(1,0)\neq 0$ for all $n\in \mathbb {N}$ .

Vector space of a linear map →

Feedback? Do you want to join?

If you have questions concerning the content, or didn't understand something, the feel free to contact us! We would love to answer your questions! Also we are thankful for critics and/or comments! If you share our vision to explain university math in an comprehensible way, then contact us under:

E-Mail: en@serlo.org

This article is licensed under the free license CC-BY-SA 3.0. With that you can use it, modify it or share it freely, as long as you name „Serlo“ as source and put you changes under the same CC-BY-SA 3.0 oder an compatible license. On the page „Kopier uns!“ we explain you what you have to pay attention to, when using our texts, picture or videos.