Can’t see the forest for the symmetries

Symmetry is a useful tool in art. It can serve to unify a piece, and make it more pleasant to the eyes. On the other hand, too much symmetry can make a piece boring; put differently, deviation from symmetry or balance can serve to make a work of art more interesting, or leave a deeper impression on the viewer.

Symmetry in mathematics is ultimately no different than what you already think it is: a symmetry of some object is a “transformation” you can do on the object that keeps the object “the same.” The simplest examples come from symmetries of shapes (like you might have seen in your youth): for instance, consider the symmetries of a regular octagon or “stop sign”:

symmetries of a stop sign (snagged from Wikipedia)

The above symmetries are obtained by either reflecting (i.e., “mirroring”) the shape about some line, or rotating the shape some angle, and we can see that (if we didn’t have the text “STOP” in the middle of the sign) there are a total of 16 symmetries.

Although this toy example illustrates what is a symmetry, it doesn’t really indicate why one would be interested in symmetries from a mathematical perspective. It’s hard to believe that there would be any deep implications of the fact that a regular octagon has 16 symmetries… This is one gripe I have with “maths outreach media” when they talk about group theory—the mathematical field specifically concerned with the study of symmetries. They provide the above toy example, but the example alone just reinforces an impression I imagine many people have about mathematicians: mathematicians just waste university funding and fumble with shapes and stuff instead of trying to address “real-world” problems.

I don’t want to sound like a broken record, as I’ve complained about this before, but the problem with this toy example is that it doesn’t demonstrate the ubiquity of symmetry in mathematics! Not only are symmetries genuinely useful in mathematics, but they’re everywhere. I’ve been putting this topic off for quite some time, but I’m going to give an attempt at demonstrating what makes symmetries useful in mathematics. So as to not contradict myself, I’m not trying to explain real-world applications of symmetry and convince you that it’s “so useful that everyone needs to know” (so I’m not going to talk about how symmetries in physics give you laws of conservation by Noether’s Theorem and that sort of thing); instead, I just want to try and illustrate how symmetry can actually be used in mathematics.

Over lunch

One might say that symmetry in general is beautiful because it is rare, yet symmetries are everywhere if you know where to look for them. Perhaps phrased differently, symmetries are fragile: even the most slight perturbations can destroy a symmetry.

While that might sound very poetic, there’s another interpretation of this fragility: symmetries are restrictive. Similar to how restricted you might feel when you enter a sparkling clean room (so as to not ruin the cleanliness), you are also limited in what you can do if you want to maintain certain symmetries. This restrictiveness provided by symmetry can be used in two ways:

You can use the restrictions of symmetry to reduce a problem to something much simpler.
You can show that something is impossible by observing that if it were possible, then it would violate symmetries present in the system.

For an instance of the first usage, consider the following problem:

Problem 1. A race is about to begin, and two cars are at the starting line. When the race begins, car A drives forward at 22 metres per second, while car B drives forward at 25 metres per second. What is the distance between car A and car B after 15 seconds?

Let’s start with the more straightforward approach to this problem:

Calculate how far car A travelled in 15 seconds.
Calculate how far car B travelled in 15 seconds.
Calculate the difference.

So, car A travelled $d_A = (22\mathrm m/\mathrm s)\times(15\mathrm s) = 330\mathrm m$ , and car B travelled $d_B = (25\mathrm m/\mathrm s)\times(15\mathrm s) = 375\mathrm m$ . The difference is $d_B - d_A = 375\mathrm m - 330\mathrm m = 45\mathrm m$ , so the final answer is 45 metres.

However, we could have avoided dealing with such big numbers if we used symmetry! The symmetry in this case is a change of reference frame. Ask: what does the driver of car A see? From driver A’s perspective, car A is not moving (driver A knows that the car is moving, but it doesn’t look like it is); instead, the road is moving backwards at 22 metres per second. From driver A’s perspective, how fast is car B moving?

If car A moves at 22 metres per second, and car B moves at 25 metres per second, then from driver A’s perspective, car B looks like it is moving forward at $25\mathrm m/\mathrm s - 22\mathrm m/\mathrm s = 3\mathrm m/\mathrm s$ . From this frame of reference, we have that car A is not moving, and car B is moving forward at 3 metres per second. After fifteen seconds, we then get immediately that the distance travelled by car B in this frame of reference is $(3\mathrm m/\mathrm s)\times(15\mathrm s) = 45\mathrm m$ , just as before! With this approach the calculation was much simpler thanks to the symmetry!

The symmetry at play here is a lot less obvious than rotational / reflection symmetries of a stop sign, because the symmetry acts on something more abstract (namely, the “physics” of the situation). Basically, the symmetry of this system is the observation that “distances don’t change if my reference camera starts moving,” and so we chose to pick a camera that moves at the same speed as car A.

On the bus

As mentioned above, group theory is the mathematical field focused on studying the behaviour of symmetries. Here, I will say a few words about the birthplace of group theory, pioneered by Lagrange, Ruffini, Abel, and—perhaps most notably—Galois.

The starting place is the quadratic formula: the solutions (for $x$ ) in any equation of the form

$ax^2 + bx + c = 0$

where $a\neq0$ can be given by the formula

$\displaystyle x = \frac{-b\pm\sqrt{b^2-4ac}}{2a}$

A natural follow-up question is if there exists a cubic formula; that is, a formula for $x$ solving

$ax^3 + bx^2 + cx + d = 0$

The answer is yes (with the minor catch that you may need complex numbers to work with the formula, even in situations where all three solutions to this equation are real). If you care, here’s the “cubic formula:”

$\displaystyle x = -\frac1{3a}\left(b + C + \frac{\Delta_0}C\right)\qquad\text{where }\begin{cases} \displaystyle C = \sqrt[3]{\frac{\Delta_1\pm\sqrt{\Delta_1^2-4\Delta_0^3}}2} \\ \Delta_0 = b^2 - 3ac \\ \Delta_1 = 2b^3 - 9abc + 27a^2d \end{cases}$

Here, the square root $\pm$ must be chosen so that $C\neq0$ , after which $C$ can be any of the three cube roots (giving three solutions).

Is there a quartic formula? Also yes, but I’ll let you look into it yourself. Things very quickly get out of hand, but the important thing is that it’s possible.

How about a quintic formula? Is there, however horrendous, a formula for the general solution to an equation

$ax^5 + bx^4 + cx^3 + dx^2 + ex + f = 0$

where $a\neq0$ ? It turns out that the answer is no! It is impossible to write the solutions to this equation down using integers, addition, subtraction, multiplication, division, and radicals ( $\sqrt[n]{\phantom{\square}}$ )!

If this isn’t surprising enough, this is even impossible for some special cases: for example,^[cf] none of the solutions to the equation $x^5 - x - 1 = 0$ can be written down using integers, addition, subtraction, multiplication, division, and radicals (not even its unique real solution $x\approx 1.1673\dots$ ).

Now, for the big question: how do we know this? How is it possible to show that a formula cannot exist?? It’s one thing to have a hard time finding a formula yourself, but to actually prove that it is impossible is an impressive feat! As you might guess from the blog title, the answer lies in using symmetries somehow.

But what symmetries?

The symmetries lie in some ambiguity in radicals $\sqrt[n]{\phantom{\square}}$ . To illustrate what I mean, consider the simple example $\sqrt2$ . By definition, $\sqrt2$ is a real number solution to the equation $x^2 - 2 = 0$ , but really, there are two solutions: the other solution is $x=-\sqrt2$ . By convention, we usually define the square root to be the positive solution. However, this is just a convention, and doesn’t matter too much.

In fact, algebraically speaking, it doesn’t matter at all! Here’s what I mean: there is no way to use an algebraic (i.e., polynomial) equation to distinguish $\sqrt2$ from $-\sqrt2$ . This may sound absurd, since clearly $\sqrt2>0$ and $-\sqrt2<0$ sets the two roots apart, but I am speaking about equations specifically. What I mean is, it is impossible to find an algebraic expression $f(x)$ such that $f(\sqrt2)=0$ but $f(-\sqrt2)\neq0$ . This means that if I were to switch $\sqrt2$ with $-\sqrt2$ behind an algebraist’s back, they wouldn’t know the difference!

In other words, swapping $\sqrt2$ with $-\sqrt2$ yields a symmetry of some sort. To be more specific about what this symmetry acts on, I need to introduce some notation. Let $\mathbb{Q}$ denote the field of rational numbers, then we can extend this number field to include $\sqrt2$ by inserting this new number, and allowing us to add, subtract, multiply, and divide rational numbers with $\sqrt2$ . Denote this field extension by $\mathbb{Q}(\sqrt2)$ .

In more detail. This means that $\mathbb{Q}(\sqrt2)$ is a set of numbers that contains the rational numbers, and also contains $\sqrt2$ , and is closed under arithmetic operations. This means that not only does $\mathbb{Q}(\sqrt2)$ contain $\sqrt2$ , but it also contains $1 + \sqrt2$ , and it also contains $(3 - 4\sqrt2)^3$ , and it contains $\frac{\sqrt2+4}{3-5\sqrt2}$ .

One can check that every number in $\mathbb{Q}(\sqrt2)$ can be written uniquely in the form $a + b\sqrt2$ , where $a, b\in\mathbb{Q}$ are rational numbers. For example,

$\displaystyle \frac{\sqrt2+4}{3-5\sqrt2} = \frac{-22}{41} + \frac{23}{41}\sqrt2$

The fact that we cannot distinguish $\sqrt2$ from $-\sqrt2$ reflects a symmetry $\sigma$ on $\mathbb{Q}(\sqrt2)$ that acts by

$\sigma(a + b\sqrt2) = a - b\sqrt2$

(In other words, $\sigma$ sends a number in $\mathbb{Q}(\sqrt2)$ to its conjugate.) Besides the “do nothing” symmetry, this is the only other symmetry of $\mathbb{Q}(\sqrt2)$ .

The story is very similar for radicals of higher degree. For example, consider the equation $x^3 - 1 = 0$ . Sure, $x=1$ is a solution, but if you allow for complex numbers, then there are two more! Indeed, these are the third roots of unity, and they sit at equal distance from each other on the unit circle:

Cube roots of unity, snagged from Wikipedia

Explicitly, the cube roots of unity are

$\displaystyle 1, \qquad \frac{-1+\sqrt{-3}}2, \qquad \frac{-1-\sqrt{-3}}2$

If you let $\omega = \frac{-1+\sqrt{-3}}2$ be the second solution, then it turns out that $\omega^2 = \left(\frac{-1+\sqrt{-3}}2\right)^2 = \frac{-1 - \sqrt{-3}}2$ is the third solution! This is to say that $\omega$ is a principal root of unity.

Note that there is an algebraic way of telling one the solutions to $x^3 - 1 = 0$ apart from the rest: we can tell $1$ apart from $\omega$ and $\omega^2$ because $x=1$ is a solution to the algebraic equation $x - 1 = 0$ , but $x=\omega$ or $x=\omega^2$ is not. On the other hand, there is no way of telling $\omega$ and $\omega^2$ apart from each other: we can once again swap them and an algebraist would never notice.

If we look at the field extension $\mathbb{Q}(\omega)$ (obtained in a similar way as for $\mathbb{Q}(\sqrt2)$ : it is the set of numbers generated by the rational numbers and $\omega$ , and it is closed under basic arithmetic), we can see that all of the solutions to the equation $x^3 - 1 = 0$ live in $\mathbb{Q}(\omega)$ . We can therefore call $\mathbb{Q}(\omega)$ the splitting field of $x^3 - 1$ (likewise, we could say that $\mathbb{Q}(\sqrt2)$ is the splitting field of $x^2 - 2$ ).

Every element of the field extension $\mathbb{Q}(\omega)$ can be written uniquely as $a + b\omega + c\omega^2$ , where $a, b, c\in\mathbb{Q}$ , and the ambiguity between $\omega$ and $\omega^2$ gives us a symmetry $\sigma$ on $\mathbb{Q}(\omega)$ given by

$\sigma(a + b\omega + c\omega^2) = a + c\omega + b\omega^2$

(in other words, $\sigma$ swaps $\omega$ with $\omega^2$ ). Besides the “do nothing” symmetry, $\sigma$ turns out to be the only other symmetry of $\mathbb{Q}(\omega)$ .

Final example. There are three solutions to the equation $x^3-2=0$ . One of them is well-known, namely $x=\sqrt[3]2$ , but there are two others. If we let $\omega = \frac{-1+\sqrt{-3}}2$ be one of the complex solutions to $x^3 - 1 = 0$ , then we get two more solutions to $x^3 - 2 = 0$ given by $x = \omega\sqrt[3]2$ and $x=\omega^2\sqrt[3]2$ . To see why, note for example that when $x = \omega\sqrt[3]2$ , then

$\displaystyle x^3 = \left(\omega\sqrt[3]2\right)^3 = \omega^3\left(\sqrt[3]2\right)^3 = 1\cdot2 = 2$

because $\omega^3 = 1$ .

In this example, we actually have no way of algebraically telling any of these three solutions apart! (This might seem strange because one of the solutions—namely $x=\sqrt[3]2$ —is real, and the other two are complex, but “being real” cannot be determined with an algebraic equation.) In fact, it turns out that all of these three solutions may be interchanged under an algebraist’s nose, and they would never notice. This leads to six symmetries in total!

The splitting field for $x^3-2$ is a bit harder to describe. It’s not just $\mathbb{Q}(\sqrt[3]2)$ (because this field extension only contains real numbers, and can’t produce the complex number $\omega\sqrt[3]2$ ). That being said, we can build it in two steps: first, we can construct the field extension $\mathbb{Q}(\omega)$ as in a previous example, and then we can further adjoin $\sqrt[3]2$ , giving us a field extension called $\mathbb{Q}(\omega)(\sqrt[3]2)$ . In other words, $\mathbb{Q}(\omega)(\sqrt[3]2)$ is a field containing all of the elements of $\mathbb{Q}(\omega)$ as well as the number $\sqrt[3]2$ , and is closed under basic arithmetic.

It turns out that every element of $\mathbb{Q}(\omega)(\sqrt[3]2)$ can be written uniquely in the form $a + b\sqrt[3]2 + c(\sqrt[3]2)^2$ where $a, b, c\in\mathbb{Q}(\omega)$ . Since we said that we can actually permute all three of the solutions to $x^3 - 2$ without an algebraist noticing, every possible permutation yields a symmetry of the splitting field $\mathbb{Q}(\omega)(\sqrt[3]2)$ ! If you work this out, this means there are $3! = 3\times2\times1 = 6$ symmetries!

I hope I haven’t lost you. The takeaway of the above paragraphs is the following: to any polynomial equation, we can construct a splitting field by adjoining to $\mathbb{Q}$ all of the (complex) solutions to the polynomial equation, and this splitting field will have some symmetries—the group of symmetries for the splitting field is called the Galois group for the polynomial equation.

So, finally, how do we use these symmetries to deduce that a quintic equation is impossible? I’ll explain the main ideas:

Given some field $\mathbb{K}$ (i.e., $\mathbb{K} = \mathbb{Q}(\alpha_1)(\alpha_2)\dots$ ), call a radical extension any field extension of the form $\mathbb{K}(\sqrt[n]a)$ , where $a\in\mathbb{K}$ .
Any radical extension $\mathbb{K}(\sqrt[n]a)$ of $\mathbb{K}$ extends the symmetries of $\mathbb{K}$ in a very particular way (namely, the extension is always by an abelian group—whatever this means).
Any number that can be written in terms of additions, subtractions, multiplications, divisions, and radicals of rational numbers lives in a field extension given by a sequence of radical extensions starting at $\mathbb{Q}$ . Given the above observation, this puts particular restrictions on the kinds of symmetries that such a field extension has (namely, the Galois group is solvable).
Now consider any polynomial equation $f(x) = 0$ . If the solutions to this equation can be written using additions, subtractions, multiplications, divisions, and radicals of rational numbers, then this means that the splitting field of $f(x)$ is obtained by successive radical extensions. In particular, the symmetries of the splitting field of $f(x)$ must be solvable.
Finally, consider an equation such as $x^5-x-1=0$ . To prove that its solutions cannot be written down with radicals, we need only prove that the group of symmetries of its splitting field is not solvable. In more detail, it turns out that all possible permutations of the five complex solutions to $x^5 - x - 1 = 0$ define symmetries of its splitting field. This means that the splitting field is the group of permutations of 5 elements (which has $5!=5\times4\times3\times2\times1=120$ elements!), and this group is far too complex to be solvable.

In the lounge

Since I’m back to talking about group theory again, I feel inclined to revisit Algebra: Chapter 0 — Joke 1.1:

“Definition.” A group is a groupoid with one object.

For the record, my opinion on this actually being a very telling definition of a group (i.e., my being “fun at parties”) has not changed since that post (which is almost two years old now, wow). For the sake of discussion, recall the relationship between groups (as defined in undergrad) and groupoids was that every group $G$ defines a groupoid $\mathbf BG$ with one object, where the endomorphisms of the unique object are given by the elements of $G$ , and composition is defined using the group multiplication. As I mentioned before, this is arguably the “true” way of formalising the fact that a group is an abstract collection of symmetries (automorphisms) of some object.

What I want to delve into a bit more is the notation for $\mathbf BG$ (which is called the delooping of $G$ ), because it is very similar to the notation for the classifying space $BG$ of $G$ , or the quotient stack $\mathbf BG = [*/G]$ .

Recall given a space $X$ that a principal $G$ -bundle on $X$ is a space $P/X$ equipped with a continuous $G$ -action over $X$ (i.e., the action preserves fibres) such that $X$ admits a covering $X=\bigcup_iU_i$ such that $P_i = P\times_XU_i$ is isomorphic over $U_i$ to $U_i\times G$ with the $G$ -action given by multiplication on the second factor (which is called “trivial”). We then have a functor $\mathbf{Top}^{\mathrm{op}}\to\mathbf{Set}$ that sends $X$ to the set of isomorphism classes of principle $G$ -bundles on $X$ (and sends continuous functions to pullbacks). This functor factors through the homotopy category of $\mathbf{Top}$ , and the classifying space for $G$ is a representing object for this functor. In particular, $BG$ comes eqipped with a tautological (universal) principal $G$ -bundle $EG/BG$ , and any principal $G$ -bundle $P/X$ can be obtained via pullback of this universal bundle along some $X\to BG$ .

Very similarly, given an $S$ -scheme $X$ recall that a $G$ -torsor (also called a principal $G$ -bundle) is a scheme $P/X$ equipped with a $G$ -action over $X$ such that the action on $T$ -points is simply transitive (or empty) for every $T/S$ , and there is an (fppf or fpqc) cover of $S$ over which the action is trivial (as in the above sense). The $T$ -points of the stack $\mathbf BG = [*/G]$ form the groupoid of $G$ -torsors over $T$ . From this, we can see the similarity between the quotient stack and the classifying space.

The delooping $\mathbf BG$ has a much easier definition, so what’s the connection? The direct connection may not be clear: the above moduli spaces are determined by maps into them (principle $G$ -bundles over $X$ are maps $X\to BG$ ), whereas the nice immediate property about the delooping is that functors $\mathbf BG^{\mathrm{op}}\to\mathcal C$ correspond to objects $X\in\mathcal C$ equipped with a $G$ -action by automorphisms of $\mathcal C$ .

Well, we can instead jut look at what functors $\mathcal C\to\mathbf BG$ classify. There is another (perhaps less intelligent) way of creating a groupoid from $G$ : namely, take the codiscrete category on the set $G$ whose objects are the elements of $G$ , where there is a unique morphism between any pair of objects. Denote this groupoid by $\mathbf EG$ , then we have a canonical functor $\mathbf EG\to\mathbf BG$ given by sending the unique morphism $g\to h$ in $\mathbf EG$ to the morphism $hg^{-1}$ in $\mathbf BG$ . Note that $\mathbf EG$ is equivalent to the singleton category, so you might see where I am going with this. (Indeed, $\mathbf EG\simeq*$ comes equipped with a trivial $G$ -action, and $\mathbf BG$ is equivalent to the 2-categorical quotient in $\mathbf{Cat}$ of $\mathbf EG$ (or the point) by this $G$ -action.)

Given a functor $\mathcal C\to\mathbf BG$ , we can consider the 2-pullback $\mathcal P := \mathcal C\times_{\mathbf BG}\mathbf EG$ . This category consists of pairs $(x, g)$ where $x\in\mathcal C$ and $g\in G$ , and the morphisms $(x, g)\to(y, h)$ are the morphisms $x\to y$ in $\mathcal C$ that lie over $hg^{-1}$ in $\mathbf BG$ . In particular, if $\mathcal C$ is a groupoid, then this construction leads to a reasonable notion of a “principal $G$ -bundle” over $\mathcal C$ .

This analogy can be made much more concrete by appealing to the Homotopy Hypothesis. Indeed, we may view any groupoid as a simplicial set via its nerve. In particular, we obtain simplicial sets $EG := N(\mathbf EG)$ and $BG := N(\mathbf BG)$ , and we can follow the above logic again to observe that pulling back $EG\to BG$ along any simplicial map $X\to BG$ induces a reasonable notion of a principal $G$ -bundle over $X$ . In fact, by taking geometric realisation, we see recover the topological classifying space for $G$ as $BG = |N(\mathbf BG)|$ !