Proof of the dot product formula

My experiences with the dot product

I struggled quite a bit when I first encountered the dot/scalar product in school. The dot product between two vectors $\mathbf{a}$ and $\mathbf{b}$ was defined to be $\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}||\mathbf{b}| \cos \theta$ where $\theta$ is the angle between the two vectors. This definition posed a bit of stumbling block for me. Up until this point, most definitions has come to be pretty intuitive (and if not at first, it starts feeling reasonably natural usually within a week or two of working with it). This formula or definition seems like it is simply plucked out of midair.

Next was the mechanics of performing the dot product: it is given that $\begin{pmatrix} a \\ b \\ c \end{pmatrix} \cdot \begin{pmatrix} d \\ e \\ f \end{pmatrix} = ad + be + cf$. The natural operation was to multiply the elements component wise to obtain the vector $\begin{pmatrix} ad \\ be \\ cf \end{pmatrix}$, so why isn’t that the case? Moreover, if I was to just accept the result, what does the quantity $ad+be+cf$ “mean”?

It took quite a bit of “just accept the formulas and use it to do questions” before I got better hang of it in school. Only a few years later at university did I start to see the significance of the operation and its generalizations in other fields. In this post I lay out how the present me would introduce the dot product to the former 17 year old me.

We are given two vectors, $\begin{pmatrix} a \\ b \\ c \end{pmatrix}$ and $\begin{pmatrix} d \\ e \\ f \end{pmatrix}$ and hope to “combine/multiply” them in some fashion. We can define the natural operation $\begin{pmatrix} a \\ b \\ c \end{pmatrix} \otimes \begin{pmatrix} d \\ e \\ f \end{pmatrix} = \begin{pmatrix} ad \\ be \\ cf \end{pmatrix}$. Turns out that is perfectly valid and is actually something we can analyze in the topic of abstract algebra and rings. At the moment, however, we are interested in vectors as a way to study geometry (and are interested in quantities such as length, areas and angles). As far as I know the operation defined “naturally” is not of much use for this, unfortunately.

We thus define the dot product to be $\begin{pmatrix} a \\ b \\ c \end{pmatrix} \cdot \begin{pmatrix} d \\ e \\ f \end{pmatrix} = ad + be + cf$. Why do we do this? Because the quantity $ad+be+cf$, while not easily visualized, is especially useful for geometrical calculations. In particular, defining the dot product this way leads to the formula $\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}||\mathbf{b}| \cos \theta$, where $\theta$ is the angle between the two vectors. This formula is really useful. Personally I find the process of finding angles using usual geometrical methods (use of triangular, circular or parallel line properties; use of trigonometry rules, etc) pretty challenging because it involves a degree of analysis and knowing where to look (which triangle should we focus on?) and what to use (which formula? The sine rule? Pythagoras Theorem?). The dot product formula, however, simplifies the process into arithmetic calculations. As long as we are able to express what we want in vectors, performing the dot product and calculating magnitudes only involve arithmetic operations of multiplication, addition and square roots. Plug the quantities into the formula and rearrange and we will obtain $\cos \theta$. Pretty nifty!

Proof of the dot product formula

Definitions and formulas we start off with

For the rest of the post we will see how the dot product formula can be derived from definition and the cosine rule from trigonometry. Let us first collect what we are starting out with. First, we define the dot product to be $\begin{pmatrix} a \\ b \\ c \end{pmatrix} \cdot \begin{pmatrix} d \\ e \\ f \end{pmatrix} = ad + be + cf$ and the magnitude of a vector to be $\left | \begin{pmatrix} a \\ b \\ c \end{pmatrix} \right | = \sqrt{a^2+b^2+c^2}$. We will use the magnitude of vector $\overrightarrow{AB}$ as the length of the line segment $AB$ and assume knowledge of the cosine rule $c^2 = a^2+b^2 – 2ab \cos C$. We will also assume the operations of vector addition and scalar multiplication and the use of the position vector formula $\overrightarrow{AB}=\overrightarrow{OB}-\overrightarrow{OA}$.

Some properties of the dot product derived from definition

Before deriving the final formula, we will need some properties of the dot product. First we have
A: $k (\mathbf{a} \cdot \mathbf{b}) = (k\mathbf{a}) \cdot \mathbf{b}$.
How this can be proven is by checking it: on the left, we perform dot product first to get $k (ad+be+cf)$ if $\mathbf{a}=\begin{pmatrix}a \\ b \\ c \end{pmatrix}$ and $\mathbf{b} = \begin{pmatrix}d \\ e \\ f \end{pmatrix}$. Expansion of our usual real numbers gives us $kad + kbe + kcf$. If we now work out the right side $(k\mathbf{a}) \cdot \mathbf{b}$ independently by first performing scalar multiplication $k\mathbf{a}$ to get $\begin{pmatrix}ka \\ kb \\ kc \end{pmatrix}$ and then the dot product we will get the same answer $kad+kbe+kcf$. The proof is pretty dry but at least we are now certain this formula works!

Our next properties are as follows:
B: the order does not matter for the dot product (i.e. the dot product is commutative): $\mathbf{a} \cdot \mathbf{b} = \mathbf{b} \cdot \mathbf{a}$.
C: we can “expand”: $\mathbf{a} \cdot (\mathbf{b} + \mathbf{c}) = \mathbf{a}\cdot \mathbf{b}+\mathbf{a}\cdot\mathbf{c}$.
D: dot product of a vector $\mathbf{a}$ with itself: $\mathbf{a} \cdot \mathbf{a} = |\mathbf{a}|^2$.

Each of these properties can be verified with a pretty dry check/proof similar to that we did for property A.

The dot product formula

With these properties in place we are ready to prove $\boxed{\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}||\mathbf{b}| \cos \theta}$ using the cosine rule. Consider the triangle $OAB$. The cosine rule gives us $AB^2 = OA^2 + OB^2 – 2 OA \; OB \cos \theta$ where $\theta$ is the angle between $OA$ and $OB$.

Let us look as the left side of the equation. $AB^2$ can be written as $|\overrightarrow{AB}|^2$ in vector notation. Application of property D gives us $|\overrightarrow{AB}|^2 = \overrightarrow{AB} \cdot \overrightarrow{AB}$. We then use our position vector formula $\overrightarrow{AB} = \overrightarrow{OB}-\overrightarrow{OA}$. This gives us $(\mathbf{b}-\mathbf{a}) \cdot (\mathbf{b}-\mathbf{a})$.

Using a combination of rules B and C gives us $\mathbf{b}\cdot \mathbf{b} – \mathbf{a}\cdot \mathbf{b} – \mathbf{b}\cdot\mathbf{a} + \mathbf{a}\cdot \mathbf{a} = \mathbf{a}\cdot\mathbf{a} + \mathbf{b}\cdot\mathbf{b}-2\mathbf{a}\cdot\mathbf{b}$. Finally we apply property D again to get $|\mathbf{a}|^2 + |\mathbf{b}|^2 – 2\mathbf{a}\cdot\mathbf{b}$.

Now let us look at the right hand side of the equation. The use of property D gives us $|\mathbf{a}|^2+ |\mathbf{b}|^2 – 2 |\mathbf{a}| |\mathbf{b}| \cos \theta$.

Plugging the simplified versions of each side into the cosine rule gives us $|\mathbf{a}|^2 + |\mathbf{b}|^2 – 2\mathbf{a}\cdot\mathbf{b} = |\mathbf{a}|^2+ |\mathbf{b}|^2 – 2 |\mathbf{a}| |\mathbf{b}| \cos \theta$. Some cancellation and algebraic manipulation finally lead us to the dot product formula: $\mathbf{a}\cdot\mathbf{b} = |\mathbf{a}||\mathbf{b}\cos\theta$. $\blacksquare$

spacer