Matrix multiplication – Serlo

In this article, you learn how to multiply matrices. We will see that matrix multiplication is equivalent to the composition of linear maps. We will also prove some properties of the matrix multiplication.

Introduction

How can we multiply matrices?

In the article on matrices of linear maps, we learned how we can use matrices to describe linear maps $f\colon V\to W$ between finite-dimensional vector spaces $V$ and $W$ . This requires fixing a basis $B$ of $V$ and a basis $C$ of $W$ , with respect to which we can define the mapping matrix $M_{C}^{B}(f)$ . In a plane of coordinates, this matrix descibes what the linear mapping $f$ does with a vector $v\in V$ :

M_{C}^{B}(f)\cdot k_{B}(v)=k_{C}(f(v))

where $k_{B}\colon V\to K^{n}$ is the coordinate mapping with respect to $B$ , which maps a vector $v=\lambda _{1}\cdot b_{1}+\ldots +\lambda _{n}\cdot b_{n}$ to the coordinate vector $(\lambda _{1},\ldots ,\lambda _{n})^{T}$ with respect to $B$ . Similarly, $k_{C}\colon W\to K^{m}$ is the coordinate mapping with respect to $C$ .

We can concatenate linear maps $f\colon V\to W$ and $g\colon W\to X$ by executing them one after the other, which results in a linear map $g\circ f\colon V\to X$ . Can we define a suitable "concatenation" of matrices? By suitable, we mean that the "concatenation" of the matrices corresponding to $f$ and $g$ should become the matrix of the map $g\circ f$ . We will call this "concatenation" of the matrices also the matrix product since it will turn out to behave almost like a product of numbers.

For example, let's consider two matrices $A\in K^{m\times l}$ and $B\in K^{p\times n}$ with the corresponding linear mas

f_{A}\colon K^{l}\to K^{m},\quad x\mapsto Ax

and

f_{B}\colon K^{p}\to K^{n},\quad x\mapsto Bx

given by matrix-vector-multiplication. Then $A$ is the matrix of $f_{A}$ (with respect to the standard bases in $K^{l}$ and $K^{m}$ ), and $B$ is the matrix of $f_{B}$ (with respect to the standard bases in $K^{p}$ and $K^{n}$ ). The product $A\cdot B$ of $A$ and $B$ should then be the matrix of $f_{A}\circ f_{B}$ .

However, in order to be able to execute the maps $f_{A}$ and $f_{B}$ one after the other, the target space of $f_{B}$ must be equal to the space on which $f_{A}$ is defined. This means that $K^{p}=K^{l}$ , i.e. $p=l$ . Therefore, the number of columns of $A$ must be equal to the number of rows of $B$ , otherwise we cannot define the product matrix $A\cdot B$ .

Computing the product matrix

What is the product $A\cdot B$ of $A$ and $B$ corresponding to the map $f_{A}\circ f_{B}\colon K^{n}\to K^{m}$ ? To compute it, we need to calculate the images of the standard basis vectors $e_{1},\ldots ,e_{n}\in K^{n}$ under the map $f_{A}\circ f_{B}$ . They will form the columns of the matrix of $f_{A}\circ f_{B}$ , that is, the matrix $A\cdot B$ .

We denote the entries of $A$ by $a_{ij}$ and those of $B$ by $b_{ij}$ , i.e. $A=(a_{ij})\in K^{m\times p}$ and $B=(b_{ij})\in K^{p\times n}$ . We also denote the desired matrix of $f_{A}\circ f_{B}$ by $A\cdot B=C=(c_{ij})\in K^{m\times n}$ .

For $i\in \{1,\ldots ,m\}$ and $j\in \{1,\ldots ,n\}$ , the entry $c_{ij}$ is given by the definition of the matrix representing $f_{A}\circ f_{B}$ by the $i$ -th entry of the vector $f_{A}(f_{B}(e_{j}))\in K^{m}$ . We can easily calculate it using the definition of $f_{A}$ and $f_{B}$ using the definition of matrix-vector-multiplication:

{\begin{aligned}c_{ij}=&(f_{A}(f_{B}(e_{j})))_{i}\\[0.3em]=&(A\cdot (f_{B}(e_{j})))_{i}\\[0.3em]&{\color {OliveGreen}\left\downarrow \ {\text{definition of matrix-vector-multiplication for }}A\right.}\\[0.3em]=&\sum _{k=1}^{n}a_{ik}(f_{B}(e_{j}))_{k}\\[0.3em]=&\sum _{k=1}^{n}a_{ik}(B\cdot e_{j})_{k}\\[0.3em]&{\color {OliveGreen}\left\downarrow \ (B\cdot e_{j})_{k}=b_{kj}\right.}\\[0.3em]=&\sum _{k=1}^{n}a_{ik}b_{kj}\end{aligned}}

This defines all entries of the matrix $C$ and we conclude

C=(\sum _{k=1}^{n}a_{ik}b_{kj})_{i,j}\in K^{m\times n}.

This is exactly the product $C=A\cdot B$ of the two matrices $A$ and $B$ .

Definition and rule of thumb

Mathematically, we can also understand matrix multiplication as an operation (just as the multiplication of real numbers).

Definition (Matrix multiplication)

The matrix multiplication is an operation

\cdot \,\colon K^{m\times p}\times K^{p\times n}\to K^{m\times n}

.

It sends two matrices $A=(a_{ij})\in K^{m\times p}$ and $B=(b_{ij})\in K^{p\times n}$ to the matrix $A\cdot B=C=(c_{ij})\in K^{m\times n}$ , given by

c_{ij}=\sum _{k=1}^{p}a_{ik}b_{kj}

for $i\in \{1,\ldots ,m\}$ and $j\in \{1,\ldots ,n\}$ .

However, there is an important difference to the multiplication of real numbers: With matrices, we have to make sure that the dimensions of the matrices we want to multiply match.

Hint

The two matrices do not have to be of the same size, but the number of columns of the matrix $A$ must be equal to the number of rows of the matrix $B$ . The result then has the number of rows of the left-hand matrix $A$ and the number of columns of the right-hand matrix $B$ . This means that two matrices $A\in K^{m\times l};\,B\in K^{p\times n}$ can only be multiplied if $l=p$ .

Warning

The two matrices $A\in K^{m\times l};\,B\in K^{p\times n}$ with $l\neq p$ can never be multiplied.

To calculate the matrix product, we use the scheme "row times column".

Rule of thumb: row times column

According to the definition, each entry in the product $A\cdot B$ is the sum of the component-wise multiplication of the elements of the $i$ -th row of $A$ with the $k$ -th column of $B$ . This procedure can be remembered as row times column, as shown in the figure on the right.

Concrete example

Example 1

We consider the following two matrices $A\in \mathbb {R} ^{2\times 3}$ and $B\in \mathbb {R} ^{3\times 4}$ :

{\begin{aligned}A={\begin{pmatrix}2&0&-1\\0&1&3\end{pmatrix}}{\text{ und}}\\[0.5em]B={\begin{pmatrix}1&0&-1&0\\1&3&-2&-1\\0&0&1&2\end{pmatrix}}\end{aligned}}

We are looking for the matrix product $C=A\cdot B\in \mathbb {R} ^{2\times 4}$ . This matrix has the form

C={\begin{pmatrix}c_{11}&c_{12}&c_{13}&c_{14}\\c_{21}&c_{22}&c_{23}&c_{24}\end{pmatrix}}

We have to calculate the individual entries $c_{ik}$ . We will do this here in detail for the entry $c_{23}$ . The calculation of the other entries works analogously.

According to the formula

{\begin{aligned}c_{23}=&\sum _{j=1}^{3}a_{2j}b_{j3}\\=&a_{21}b_{13}+a_{22}b_{23}+a_{23}b_{33}\\&{\color {OliveGreen}\left\downarrow {\text{ reading off entries of }}A{\text{ and }}B\right.}\\=&0\cdot (-1)+1\cdot (-2)+3\cdot 1\\=&1\end{aligned}}

This calculation can also be seen as the "multiplication" of the 2nd row of $A$ with the 3rd column of $B$ . To illustrate this, we mark the entries from the sum in the matrices. We have the sum

c_{23}={\color {Orange}0}\cdot {\color {RoyalBlue}(-1)}+{\color {Orange}1}\cdot {\color {RoyalBlue}(-2)}+{\color {Orange}3}\cdot {\color {RoyalBlue}1}

These are the following entries in the matrices:

{\begin{aligned}A={\begin{pmatrix}2&0&-1\\{\color {Orange}0}&{\color {Orange}1}&{\color {Orange}3}\end{pmatrix}}{\text{ and }}B={\begin{pmatrix}1&0&{\color {RoyalBlue}-1}&0\\1&3&{\color {RoyalBlue}-2}&-1\\0&0&{\color {RoyalBlue}1}&2\end{pmatrix}}\end{aligned}}

In this way, we can also determine the other entries of $C$ and obtain

C={\begin{pmatrix}2&0&-3&-2\\1&3&1&5\end{pmatrix}}

Example 2

We consider the following matrices $A\in \mathbb {R} ^{1\times 3}$ and $B\in \mathbb {R} ^{3\times 1}$ :

A={\begin{pmatrix}5&2&-1\end{pmatrix}}{\text{ and }}B={\begin{pmatrix}1\\0\\4\end{pmatrix}}

In this case, we can calculate both $A\cdot B$ and $B\cdot A$ . Let $C:=A\cdot B$ . Then $C$ is a $1\times 1$ -matrix $C=(c_{11})$ . We calculate its only entry:

{\begin{aligned}c_{11}=5\cdot 1+2\cdot 0+(-1)\cdot 4=1\end{aligned}}

Thus, $A\cdot B=(1)$ .

Let $D:=B\cdot A$ . Then $D$ is a $3\times 3$ -matrix. We can calculate the entries of $D$ by the scheme "row times column". For example, the first entry of $D$ is the first row of $B$ times the first column of $A$ , i.e. $1\cdot 5=5$ . If we do this with each entry, we get

B\cdot A=D={\begin{pmatrix}5&2&-1\\0&0&0\\20&8&-4\end{pmatrix}}

Example 3

In this example, we want to illustrate that the matrix multiplication really corresponds to "concatenating two matrices". That means, if we have two matrices $A$ and $B$ that we apply to a vector $v$ , then we always have $(A\cdot B)v=A(Bv)$ . As an example, let $A\in \mathbb {C} ^{2\times 2}$ and $B\in \mathbb {C} ^{2\times 3}$ be the following matrices with entries in $\mathbb {C}$ :

A={\begin{pmatrix}2\mathrm {i} &0\\-1&2\end{pmatrix}}{\text{ and }}B={\begin{pmatrix}1&1-2\mathrm {i} &-1\\1+\mathrm {i} &2-\mathrm {i} &0\end{pmatrix}}

Let further $v=(2,0,1)^{T}$ . We check that $(A\cdot B)v=A(Bv)$ . To do so, we first calculate the matrix product $A\cdot B$ :

A\cdot B={\begin{pmatrix}2\mathrm {i} &4+2\mathrm {i} &-2\mathrm {i} \\1+2\mathrm {i} &3&1\end{pmatrix}}

Now we multiply this matrix with $v$ :

(A\cdot B)v={\begin{pmatrix}2\mathrm {i} &4+2\mathrm {i} &-2\mathrm {i} \\1+2\mathrm {i} &3&1\end{pmatrix}}{\begin{pmatrix}2\\0\\1\end{pmatrix}}={\begin{pmatrix}2\mathrm {i} \\3+4\mathrm {i} \end{pmatrix}}

Next, we compute $A(Bv)$ .

Bv=B={\begin{pmatrix}1&1-2\mathrm {i} &-1\\1+\mathrm {i} &2-\mathrm {i} &0\end{pmatrix}}{\begin{pmatrix}2\\0\\1\end{pmatrix}}={\begin{pmatrix}1\\2+2\mathrm {i} \end{pmatrix}}

We now apply $A$ to this vector:

A(Bv)={\begin{pmatrix}2\mathrm {i} &0\\-1&2\end{pmatrix}}{\begin{pmatrix}1\\2+2\mathrm {i} \end{pmatrix}}={\begin{pmatrix}2\mathrm {i} \\3+4\mathrm {i} \end{pmatrix}}

Indeed, here we have $(A\cdot B)v=A(Bv)$ .

Properties of matrix multiplication

We now collect a few properties of the matrix multiplication.

Shortening rule for matrices representing linear maps

The following theorem shows that matrix multiplication actually reflects the composition of linear mappings.

Theorem (Shortening rule for matrices representing linear maps)

Let $f\colon V\to W$ and $g\colon W\to X$ be linear maps between finite-dimensional vector spaces. Furthermore, let $B=\{v_{1},\ldots ,v_{m}\}$ be a basis of $V$ , let $C=\{w_{1},\ldots ,w_{n}\}$ be a basis of $W$ and $D=\{x_{1},\ldots ,x_{s}\}$ a basis of $X$ . Then we can "shorten the $C$ ":

M_{D}^{B}(g\circ f)=M_{D}^{C}(g)\cdot M_{C}^{B}(f).

Proof (Shortening rule for matrices representing linear maps)

We set $h=g\circ f$ and $(h_{ij})_{ij}=M_{D}^{B}(h)\in K^{s\times m}$ . Further, the matrices of $f$ and $g$ are given by $M_{C}^{B}(f)=(f_{ij})_{ij}\in K^{n\times m}$ and $M_{D}^{C}(g)=(g_{ij})_{ij}\in K^{s\times n}$ .

By definition of the matrix of a linear map, we know that the $h_{ij}$ are the unique scalars with

h(v_{j})=\sum _{i=1}^{s}h_{ij}x_{i}

for all $j\in \{1,\ldots ,m\}$ . In oprder to prove $(h_{ij})=(g_{ij})(f_{ij})$ , we need to verify that

h_{ij}=\sum _{k=1}^{n}g_{ik}f_{kj}

And indeed,

{\begin{aligned}h(v_{j})&=\,g(f(v_{j}))\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{definition of }}M_{C}^{B}(f)\right.}\\[0.3em]&=\,g\left(\sum _{k=1}^{n}f_{kj}w_{k}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{linearity of }}g\right.}\\[0.3em]&=\,\sum _{k=1}^{n}f_{kj}g(w_{k})\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{definition of }}M_{D}^{C}(g)\right.}\\[0.3em]&=\,\sum _{k=1}^{n}f_{kj}\sum _{i=1}^{s}g_{ik}x_{i}\\[0.3em]&=\,\sum _{i=1}^{s}\left(\sum _{k=1}^{n}g_{ik}f_{kj}\right)x_{i}.\end{aligned}}

By uniqueness of the coordinates in the linear combination of $x_{i}$ , we conclude $h_{ij}=\sum _{k=1}^{n}g_{ik}f_{kj}$ .

Warning

For the shortening rule, it is important that the same ordered basis $C$ of $W$ is chosen in both cases for the matrices representing $f$ and $g$ . If $M_{D}^{\tilde {C}}(g)$ is taken with respect to a different basis ${\tilde {C}}\neq C$ of $W$ , then the shortening rule is no longer true: The following is in general a false statement:

M_{D}^{B}(g\circ f)=M_{D}^{\tilde {C}}(g)\cdot M_{C}^{B}(f).

As matrices representing linear maps depend on the order of the basis vectors, the shortening rule also becomes false if ${\tilde {C}}$ is a rearrangement of $C$ .

Associativity of matrix multiplication

Theorem (Associativity of matrix multiplication)

For $A\in K^{m\times n},\,B\in K^{n\times l},\,C\in K^{l\times k}$ we have

(A\cdot B)\cdot C=A\cdot (B\cdot C).

Proof (Associativity of matrix multiplication)

First, we check that the sizes of the matrices that we want to multiply are compatible. This is directly visible for the products $A\cdot B$ and $B\cdot C$ . Now $A\cdot B\in K^{m\times l}$ and $B\cdot C\in K^{n\times k}$ , so the products on both sides of the equation are well-defined: they are both in $K^{m\times k}$ .

Now we look at the individual components of the matrices to verify the equality. Let $A=(a_{ij}),\,B=(b_{ij}),\,C=(c_{ij})$ .

{\begin{aligned}((A\cdot B)\cdot C)_{ij}&=\,\sum _{s=1}^{l}(A\cdot B)_{is}c_{sj}=\\[0.3em]&=\,\sum _{s=1}^{l}\left(\sum _{t=1}^{n}a_{it}b_{ts}\right)c_{sj}\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{by distributivity in }}K\right.}\\[0.3em]&=\,\sum _{s=1}^{l}\sum _{t=1}^{n}(a_{it}b_{ts})c_{sj}\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{by associativity in }}K\right.}\\[0.3em]&=\,\sum _{s=1}^{l}\sum _{t=1}^{n}a_{it}(b_{ts}c_{sj})\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{by distributivity in }}K\right.}\\[0.3em]&=\,\sum _{t=1}^{l}a_{it}\left(\sum _{s=1}^{n}b_{ts}c_{sj}\right)\\[0.3em]&=\,(A\cdot (B\cdot C))\end{aligned}}

Compatibility with scalar multiplication

Theorem (Compatibility with scalar multiplication)

Let $\beta \in K,\ A=(a_{ij})\in K^{m\times n}$ and $B=(b_{ij})\in K^{n\times p}$ . Then:

\beta \cdot (A\cdot B)=(\beta \cdot A)\cdot B=A\cdot (\beta \cdot B)

Note that " $\cdot$ " refers to both scalar multiplication ("scalar times matrix") and matrix multiplication ("matrix times matrix").

Proof (Compatibility with scalar multiplication)

{\begin{aligned}(\beta \cdot (A\cdot B))_{ik}&=\,\beta \left(\sum _{j=1}^{n}a_{ij}b_{jk}\right)\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{by distributivity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}\beta (a_{ij}b_{jk})\\&{\color {OliveGreen}\left\downarrow {\text{by associativity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(\beta a_{ij})b_{jk}&=\,((\beta \cdot A)\cdot B)_{ik}\\&{\color {OliveGreen}\left\downarrow {\text{by commutativity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(a_{ij}\beta )b_{jk}\\&{\color {OliveGreen}\left\downarrow {\text{by associativity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}a_{ij}(\beta b_{jk})&=\,(A\cdot (\beta \cdot B))_{ik}\\\end{aligned}}

Distributivity of matrix multiplication

Here we must be careful that the sizes of the matrices are compatible.

Theorem (First distributive law)

For $A=(a_{ij})\in K^{m\times n},\,B=(b_{ij})\in K^{m\times n},\,C=(c_{ij})\in K^{n\times p}$ we have

(A+B)\cdot C=A\cdot C+B\cdot C

Proof (First distributive law)

${\begin{aligned}((A+B)\cdot C)_{ik}\,&=\,\sum _{j=1}^{n}(A+B)_{ij}c_{jk}\,=\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{by definition}}\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(a_{ij}+b_{ij})c_{jk}\\&{\color {OliveGreen}\left\downarrow {\text{by distributivity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(a_{ij}c_{jk}+b_{ij}c_{jk})\\&{\color {OliveGreen}\left\downarrow {\text{by commutativity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(a_{ij}c_{jk})+\sum _{j=1}^{n}(b_{ij}c_{jk})\\&{\color {OliveGreen}\left\downarrow {\text{by definition}}\right.}\\[0.3em]&=\,(A\cdot C)_{ik}+(B\cdot C)_{ik}\\\end{aligned}}$

Theorem (Second distributive law)

For $A=(a_{ij})\in K^{m\times n},\,B=(b_{ij})\in K^{n\times p},\,C=(c_{ij})\in K^{n\times p}$ we have

A\cdot (B+C)=A\cdot B+A\cdot C

Proof (Second distributive law)

${\begin{aligned}(A\cdot (B+C))_{ik}\,&=\,\sum _{j=1}^{n}a_{ij}(B+C)_{jk}\,=\\[0.3em]&{\color {OliveGreen}\left\downarrow {\text{by definition}}\right.}\\[0.3em]&=\,\sum _{j=1}^{n}a_{ij}(b_{jk}+c_{jk})\\&{\color {OliveGreen}\left\downarrow {\text{by distributivity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(a_{ij}b_{jk}+a_{ij}c_{jk})\\&{\color {OliveGreen}\left\downarrow {\text{by commutativity in }}K\right.}\\[0.3em]&=\,\sum _{j=1}^{n}(a_{i}b_{jk})+\sum _{j=1}^{n}(a_{ij}c_{jk})\\&{\color {OliveGreen}\left\downarrow {\text{by definition}}\right.}\\[0.3em]&=\,(A\cdot B)_{ik}+(A\cdot C)_{ik}\\\end{aligned}}$

Left and right neutral element of matrix multiplication

We denote the entries of the unit matrix with $\delta _{ij}$ , i.e. $I_{m}=(\delta _{ij})$ . Then

\delta _{ij}={\begin{cases}0&{\text{if }}i\neq j,\\1&{\text{if }}i=j.\end{cases}}

Theorem (The unit matrix is a left- and right-neutral element of the matrix multiplication)

Let $M=(m_{ij})\in K^{m\times n}$ . Then

I_{m}\cdot M=M=M\cdot I_{n}

Proof (The unit matrix is a left- and right-neutral element of the matrix multiplication)

Proof step: $I_{m}\cdot M=M$

We prove this equality by direct multiplication. The following holds for all $i\in \{1,\ldots ,m\}$ and for all $j\in \{1,\ldots ,n\}$ :

$(I_{m}\cdot M)_{ij}=\sum _{k=1}^{m}\delta _{ik}\cdot m_{kj}=m_{ij}$

For the last equality, we used the fact that $\delta _{ij}=0$ if $i\neq j$ and $\delta _{ii}=1$ . Since each entry of $I_{m}\cdot M$ matches the entry of $M$ at the same position, the two matrices are equal.

Proof step: $M\cdot I_{n}=M$

We proceed as in the first proof step. For all $i\in \{1,\ldots ,m\}$ and for all $j\in \{1,\ldots ,n\}$ we have:

$(M\cdot I_{n})_{ij}=\sum _{k=1}^{n}m_{ik}\cdot \delta _{kj}=m_{ij}$

This proves equality of both sides.

In other words, the unit matrix (of the correct size) is the left- or right-neutral element with respect to matrix multiplication.

Non-commutativity

Example (Non-commutativity of $2\times 2$ -matrices)

For $2\times 2$ matrices, we can see that commutativity fails within the following example: On the one hand

{\begin{pmatrix}1&1\\0&1\end{pmatrix}}\cdot {\begin{pmatrix}0&0\\1&0\end{pmatrix}}={\begin{pmatrix}1&0\\1&0\end{pmatrix}}

and on the other hand

{\begin{pmatrix}0&0\\1&0\end{pmatrix}}\cdot {\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&0\\1&1\end{pmatrix}}

So the order of the matrix multiplication matters!

Warning

In general, $A\cdot B\neq B\cdot A$ , so the matrix product is not commutative.

The commutative law only applies in a few special cases (e.g. products of diagonal matrices).

As the number of rows and columns of the matrices must match, it is even possible that one of the two products is not even defined! For example, for $A\in K^{4\times 3}{\text{ and }}B\in K^{3\times 2}$ the product $A\cdot B\in K^{4\times 2}$ is defined, but the product $B\cdot A$ is not defined.

Matrix multiplication – Serlo

Inhaltsverzeichnis

Introduction

How can we multiply matrices?

Computing the product matrix

Definition and rule of thumb

Concrete example

Example 1

Example 2

Example 3

Properties of matrix multiplication

Shortening rule for matrices representing linear maps

Associativity of matrix multiplication

Compatibility with scalar multiplication

Distributivity of matrix multiplication

Left and right neutral element of matrix multiplication

Non-commutativity

Further reading