The logic behind matrix diagonalization with eigenvectors

Diagonalization has a really close relationship with eigenvectors - the standard steps in diagonalizing a matrix is to find the eigenvalues with an eigenbasis. It first seemed like magic to me before knowing why this works. In this post, I’ll try to explain the rationale behind with the concept of linear transformation with two approaches: similarity diagram and matrix. The former one is a derivation oriented on the whole basis and linear transformation landscape, and the latter is merely a verification of correctness based on pure matrix multiplication mechanics.

Definitions

First off, we’ll make some definitions to make sure we are on the same page:

Diagonal matrix

A diagonal matrix is a matrix in which all entries are zeros, except possibly the entries that lie on the diagonal line ($a_{ij} ~\forall i=j$)

\[\text{Diagonal matrix } D = \begin{pmatrix} a_{11} & & & \\ & a_{22} & & \\ & & \ddots & \\ & & & a_{nn} \\ \end{pmatrix}\]

Diagonalization is the process of expressing a square matrix $A$ into a product of matrices, such that for diagonal matrix $D$ and non-singular matrix $P$,

\[A = PDP^{-1}.\]

Basis

A basis is a set of linear independent vectors that spans a vector space. We now define the $\text{Rep}$ notation. For a basis $B$ that consists of three vectors $\lang \vec{\beta_1}, \vec{\beta_2}, \vec{\beta_3} \rang$, the notation

\[\text{Rep}_B({\vec{v}})= \begin{pmatrix}c_1\\c_2\\c_3\end{pmatrix}\]

means that

\[\vec{v}=c_1\vec{\beta}_1+c_2\vec{\beta}_2+c_3\vec{\beta}_3.\]

That is, we are expressing $\vec{v}$ from standard basis $\varepsilon_3$ in basis $B$. If that’s too abstract, you can check out 3blue1brown’s explanation with visualizations here.

Standard basis

The standard basis $\varepsilon_n$ is the basis $\lang \begin{pmatrix}1 \\0\\ \vdots \\ 0\end{pmatrix}, \begin{pmatrix}0 \\1\\ \vdots \\ 0\end{pmatrix},~ \dots ~,\begin{pmatrix}0 \\ 0 \\ \vdots \\ 1\end{pmatrix}\rang$. For example, $\varepsilon_3$ is the familiar $xyz$ coordinate.

Change of basis matrix

A Change of basis matrix that changes a vector from basis $B$ to $D$ is denoted by $\text{Rep}_{B,D}(id)$. Note that a linear transformation represented by a matrix is to be applied on the left. Consider the following operation:

\[\text{Rep}_{B,D}(id)\times\text{Rep}_B(\vec{v})=\text{Rep}_D(\vec{v})\]

Originally, we have $\vec{v}$ expressed in basis $B$. By multiplying the change of basis matrix on the left, it is now expressed in basis $D$.

Eigenvector & Eigenvalues

An Eigenvector is a non-zero vector that, when applied with a linear transformation, becomes a scalar multiple of the original. Formally, for a linear transformation $t$ and an eigenvector $\vec{v}\neq0$,

\[t(\vec{v})=\lambda \vec{v}\]

for some constants $\lambda$. If we represent the linear transformation $t$ with matrix $A$, then

\[A\vec{v}=\lambda \vec{v}.\]

The corresponding $\lambda$ is termed the eigenvalue for vector $\vec{v}$.

Diagonalizing matrix with eigenvectors

The standard procedure to diagonalize an $n\times n$ matrix is to find $n$ eigenvalues $\lambda_i$ (not required to be distinct) and form a diagonal matrix with them. $A$ is formed by corresponding eigenvectors. That is

\[P= \begin{pmatrix} \vec{v_1} & \vec{v_2} & \dots & \vec{v_n} \end{pmatrix},\quad D= \begin{pmatrix} \lambda_{1} & & & \\ & \lambda_{2} & & \\ & & \ddots & \\ & & & \lambda_{n} \\ \end{pmatrix} \tag{1}\]

where

\[A\vec{v_i}=\lambda_i\vec{v}_i \quad\forall~1\le i \le n\tag{2}\]

Note that the vectors $\vec{v}_i$ should be linear independent. As $P$ is constructed from $\vec{v}_i$, if any of them are linear dependent, then we would not have $P^{-1}$.

An example

We’ll look at a sample of $2\times 2$ matrix diagonalization. Consider the matrix

\[A = \begin{pmatrix} 5 & 3 \\ 1 & 7 \end{pmatrix}\]

We find the eigenvalues from the characteristic equation

\[\begin{vmatrix} A-\lambda I \end{vmatrix}=0 \\ (5-\lambda)(7-\lambda)-3=0 \\ \lambda^2-12\lambda+32=0\\ \lambda = 4 \text{ or } \lambda = 8\]

For $\lambda = 4$, the system $(A-\lambda I ~\vert~ 0)$ becomes

\[\begin{pmatrix} \begin{array}{c c|c} 1 & 3 & 0 \\ 1 & 3 & 0 \\ \end{array} \end{pmatrix}\]

Thus, $\vec{v_1}=\begin{pmatrix}-3 \\ 1\end{pmatrix}$ is an eigenvector with eigenvalue $4$. One can easily verify that $A\begin{pmatrix}-3 \\ 1\end{pmatrix} = 4\begin{pmatrix}-3 \\ 1\end{pmatrix}$.

For $\lambda = 8$, the system is

\[\begin{pmatrix} \begin{array}{c c|c} -3 & 3 & 0 \\ 1 & -1 & 0 \\ \end{array} \end{pmatrix}\]

Thus, $\vec{v_2}=\begin{pmatrix}1 \\ 1\end{pmatrix}$ is an eigenvector with eigenvalue $8$.

Therefore, the diagonalization is $P=\begin{pmatrix}\vec{v}_1 & \vec{v}_2\end{pmatrix}=\begin{pmatrix} -3 & 1\\ 1 & 1\end{pmatrix}, \quad D=\begin{pmatrix}4 & 0\\ 0 & 8\end{pmatrix}$. We can verify that

\[\begin{pmatrix} 5 & 3 \\ 1 & 7 \end{pmatrix} = \begin{pmatrix} -3 & 1\\ 1 & 1\end{pmatrix}\begin{pmatrix}4 & 0\\ 0 & 8\end{pmatrix}\begin{pmatrix} -3 & 1\\ 1 & 1\end{pmatrix}^{-1}.\]

Deriving from the similarity diagram

This way gives you a better intuition on why this works. We’ll introduce the similarity diagram adopted from Jim:

\[\Large\begin{CD} \varepsilon_n @>{A}>> \varepsilon_n \\ @A{\text{Rep}_{B,\varepsilon_n}(id)~}AA @AA{~\text{Rep}_{B,\varepsilon_n}(id)}A\\ B @>>{D}> B \end{CD}\]

The four corners signifies the basis that represent the space $\R^n$. We can see that there are two matrices $A, D$, representing the linear transformations $t:\varepsilon_n\to\varepsilon_n$ and $t: B\to B$ respectively, where $B=\lang \vec{\beta}_1,~ \dots \vec{\beta}_n \rang$ is an eigenbasis. The vertical arrows signifies the path to convert a vector in basis $B$ to $\varepsilon_n$. For sure it could be done with the change of basis matrix $\text{Rep}_{B,\varepsilon_n}(id)$.

There are actually two ways to go from the upper-left corner to the upper-right one:

\[\Large\begin{CD} \varepsilon_n @>{A}>> \varepsilon_n \\ @. @.\\ B @. B \end{CD}\]

\[\Large\begin{CD} \varepsilon_n @. \varepsilon_n \\ @V{\textcolor{red}{\text{Rep}_{B,\varepsilon_n}(id)^{-1}~}}VV @AA{~\text{Rep}_{B,\varepsilon_n}(id)}A \\ B @>>{D}> B \end{CD}\]

Note that we flipped the first step and replaced it with the identity $\text{Rep}_{B,D}(id)\times \text{Rep}_{D,B}(id)=I$. Intuitively, if we change the basis from $D$ to $B$ with $\text{Rep}_{D,B}(id)$, and then from $B$ to $D$ with the matrix $\text{Rep}_{B,D}(id)$, the overall operation is nothing. That is, in matrix terms, $I$.

In the second route, since we know $B$ is an eigenbasis, $D$ must be diagonal. Why? We try to construct $D=\text{Rep}_{B, B}(t)$ from the standard procedure.

\[\begin{split} D&=\text{Rep}_{B, B}(t)\\ &=\begin{pmatrix} \text{Rep}_{B}(t(\vec{\beta_1})) & \dots & \text{Rep}_{B}(t(\vec{\beta_n})) \end{pmatrix}\\ &=\begin{pmatrix} \text{Rep}_{B}(\lambda_1\vec{\beta_1}) & \dots & \text{Rep}_{B}(\lambda_n\vec{\beta_n}) \end{pmatrix} \\ &=\begin{pmatrix} \lambda_1\text{Rep}_{B}(\vec{\beta_1}) & \dots & \lambda_n\text{Rep}_{B}(\vec{\beta_n}) \end{pmatrix} \quad\quad (3)\\ &=\begin{pmatrix} \lambda_{1} & & & \\ & \lambda_{2} & & \\ & & \ddots & \\ & & & \lambda_{n} \\ \end{pmatrix} \end{split}\]

Step $(3)$ can be derived from scalar commutativity: $\text{Rep}_B(\lambda_i\vec{\beta}_i)=\text{Rep}_{\varepsilon_n,B}(id)\times(\lambda_i\vec{\beta}_i)=\lambda_i\text{Rep}_{\varepsilon_n,B}(id)\times\vec{\beta}_i=\lambda_i\text{Rep}_{B}(\vec{\beta_i})$. Our last step come from the obvious fact that $\text{Rep}_{B}(\vec{\beta_i})=\begin{pmatrix} \vdots\\ 1\\ \vdots \end{pmatrix}$, because of how we defined basis $B$. A routine checking from above shows

\[0\vec{\beta}_1+\dots+1\vec{\beta}_i+\dots+0\vec{\beta}_n=\vec{\beta}_i.\]

The overall operation of the route is thus $\text{Rep}_{B,\varepsilon_n}(id)\cdot D\cdot\text{Rep}_{B,\varepsilon_n}(id)^{-1}$ (remember to apply each operation to the right). Since the two routes yield the same result, we can write

\[A=\text{Rep}_{B,\varepsilon_n}(id)\cdot D\cdot\text{Rep}_{B,\varepsilon_n}(id)^{-1}\]

which is in the form $A=PDP^{-1}$.

$\square$

For now, you should know the true form of $A=PDP^{-1}$:

\[A=PDP^{-1}\iff\text{Rep}_{\varepsilon_n,\varepsilon_n}(t)=\text{Rep}_{B,\varepsilon_n}(id)\cdot \text{Rep}_{B,B}(t)\cdot\text{Rep}_{B,\varepsilon_n}(id)^{-1}\]

What we are actually doing is to change the basis from $\varepsilon_n$ to an eigenbasis $B$ with $P^{-1}$ (that is $\text{Rep}_{\varepsilon_n,B}(id)$), so that the linear transformation in that basis could be represented by a diagonal matrix $D$. We finally convert the vector back to basis $\varepsilon_n$ with $P$. This should be clear as demonstrated by the similarity diagram.

Verifying with matrix operation

We’ll now use a more algebraic way to prove the statement. We follow the construction from the section Diagonalizing matrix with eigenvectors.

\[\begin{split} PDP^{-1}&= \begin{pmatrix} \vec{v_1} & \dots & \vec{v_n} \end{pmatrix} \begin{pmatrix} \lambda_{1} & & \\ & \ddots & \\ & & \lambda_{n} \\ \end{pmatrix} \begin{pmatrix} \vec{v_1} & \dots & \vec{v_n} \end{pmatrix}^{-1}\\ &= \begin{pmatrix} \lambda_1 \vec{v_1} & \dots & \lambda_n \vec{v_n} \end{pmatrix} \begin{pmatrix} \vec{v_1} & \dots & \vec{v_n} \end{pmatrix}^{-1}\\ &= \begin{pmatrix} A \vec{v_1} & \dots & A \vec{v_n} \end{pmatrix} \begin{pmatrix} \vec{v_1} & \dots & \vec{v_n} \end{pmatrix}^{-1}\\ &= A\begin{pmatrix} \vec{v_1} & \dots & \vec{v_n} \end{pmatrix} \begin{pmatrix} \vec{v_1} & \dots & \vec{v_n} \end{pmatrix}^{-1} \\ &= A \end{split}\]

$\square$

Conclusion

The similarity diagram and basis thing could be too theoretic, but it is definitely worth it to understand the whole logic behind. With all the hard work above, we finally arrived at the diagonalization method, which is a powerful tool for fast matrix exponentials. In the theorem $A^k=PD^kP^{-1}$, it is much easier to compute $D^k$ and $P^{-1}$ compared to $A^k$.

As I mentioned in the intro, eigenvectors and diagonalization are so closely related that I still don’t know which one was first defined (In Jim’s book, the concept of eigenvectors first appear in the diagonalization section without being named) . Maybe I’ll take a look at the developement of linear algebra for fun.

Notations and insights for the similarity diagrams come from Linear Algebra by Jim Hefferon.