# Where do axioms come from?

In the last post, we talked about what math is. To me, math is a quest of understanding what must be.

The basis for this quest are the axioms and definitions of mathematics. Definitions describe what we are talking about, while axioms describe what we assume those objects can do.

Where do those axioms and definitions come from?

Since math is taught so authoritatively, it can seem that the definitions and axioms of mathematics are part of what must be. That may be true to some extent, but that is not how math is done.

As we try to increase our mathematical understanding, our needs change. We realize that certain ideas or definitions we used before weren’t quite precise or rigorous enough to deal with the questions we want to ask now. Sometimes, we find out that our previous understanding was simply lacking.

In this post, I’d like to give some of the motivating reasons for the axioms and definitions we commonly use in mathematics. The reasons are general and overlap, and are probably not exhaustive.

The first of these reasons is trying to codify intuitive ideas.

For an example we can go back to calculus.

The idea of a “continuous function” is fairly simple; a function is continuous if, when you graph it, you don’t have to lift up your pencil.

For many purposes, that intuition is sufficient. But, if you needed to, how could you make this definition precise?

Though calculus was invented in the mid 1600’s, it wasn’t until 1817 that Bernard Bolzano gave the modern definition of continuity. His “epsilon-delta” definition also demonstrates another common occurrence for new, precise mathematical definitions. Even though the intuition for them is fairly clear, the technical details can be very confusing.

The more intuitive way to say his definition is “A function $f(x)$ is continuous at a point $x_0$, if inputs near the original input $x_0$ give outputs near the original output $f(x_0)$.” This makes more precise what we mean when we say a continuous function has no jumps.

But the precise way of stating this is “A function $f(x)$ is continuous at a point $x_0$ if, for any $\epsilon >0$, there exists a $\delta>0$ so that, if $|x-x_0|<\delta$ implies that $|f(x)-f(x_0)|<\epsilon$.”1

One of the first jobs of any math major beginning his or her “proofs” classes is to really internalize this definition. It, and its variations, come up all over the place.

Axioms and definitions are sometimes invented trying to answer the question, “what makes this proof work?”

It almost feels like cheating–you know the outcome you want, so just assume the things that makes it work!

If you’ve taken calculus, probably the most important theorem you learned was the Fundamental Theorem of Calculus. One way to state this theorem is “If $F(x)$ has a (continuous) derivative, then $F(b)-F(a) = \displaystyle\int_a^b F'(x)\,dx$.” In other words, the integral of the derivative is the original function.

But the assumption that $F(x)$ has a continuous derivative is stronger than really necessary. For instance, the theorem still works for $F(x) = |x|$, even though $|x|$ does not have a derivative (i.e., a well defined slope) at $x=0$.

So what property does a function need to make the fundamental theorem work? Somewhere in the proof, at some point you need to use that $F(x)$ has a continuous derivative. But if you look closely, you find you don’t quite need that condition. Instead, you need something a bit weaker. That precise condition is just given a name, absolutely continuous. (You can see the definition here on Wikipedia.)

Absolute continuity is not a basic, obvious definition or idea. It’s not very elegant in anyone’s view. It’s simply the condition that makes the proof work.

“What makes it work” might not be very elegant, but it is how math is done.

If we don’t know how to prove the theorem we want to, we’ll often ask, “What extra condition could we assume that would make it possible to prove this theorem?” And then we assume that condition holds, and often give it a name like “tame” or “well-behaved.” The conditions aren’t special or elegant–but they work.2

Another way that definitions are invented is when mathematicians want to generalize an idea to a more general situation. Another way to say this is that mathematicians are trying to somehow identify the intrinsic something of an idea.

This is a major theme of modern mathematics. “What does it mean to be a shape?” “What does it mean to multiply things?” These two questions lead to the complete reformulations of branches of mathematics.

Until Bernhard Riemann, a shape was always visualized in the plane or in space (or perhaps a higher dimensional $\mathbb{R}^n$.) But what makes a shape a shape? Riemann asked this question, and decided that the property that makes a shape a shape is that, at any point of the shape, you can travel in a certain number of directions. (In 3 dimensions, this would be up/down, left/right, and forward/backward.)

The usual visualization of shapes in space was a crutch that distracted us from the intrinsic properties of that shape. These “many-fold quantities,” as Riemann called them, or manifolds, as we call them now, have become the basis for geometry. (We’ve talked extensively about manifolds in this blog, starting here.)

Multiplying numbers has been done for as long as math has been done. More recently, multiplication of matrices has become useful. But what makes multiplication multiplication?

Answering that question leads to the idea of a group, the basis of the field of abstract algebra. A group is a bunch of things that you can multiply. You don’t really care what these things are (matrices, functions, numbers, shapes, symmetries, etc.), as long as you know how to “multiply” them. The general rules for what multiplication must do are the axioms of a group.3

These kinds of generalization often seems weird and/or useless when you’re first introduced to it. Even worse, it always feels like you’re adding a layer of complexity to something that is already complicated enough.

But stripping away the extra details and focusing on the core ideas turns out to be very valuable. First, sometimes it makes it easier to prove results you care about. Second, by unifying very disparate ideas (such as matrix multiplication and rotations and normal multiplication), if you can prove a theorem about groups in general, than it applies to all of these very different situations.

Finally, sometimes we have to come up with new axioms because our old ones were just plain wrong.

Because math is an investigation into what must be, we really don’t like when there are contradictions. In fact, we feel like there shouldn’t ever be contradictions. After all, we proved everything, right?

Usually they’re just an indication that you made a mistake somewhere in your reasoning. (I’m intimately familiar with that one…)

And often, mathematical theorems or examples can seem paradoxical, but really the only problem is with your intuition.

But occasionally, real problems are found.

One of the most prominent examples of this is Russell’s paradox.

Intuitively, a set is any collection of objects you can define. For instance, the integers between 1 and 5 are a set, $\{1, 2, 3, 4, 5\}$. All the natural numbers is a set. You can have more complicated sets, like the set of all sets of numbers.

Georg Cantor, among others, had enumerated what things you could do with sets.4 But a naive interpretation of sets, which works well enough for most purposes, leads to contradictions.

Russell’s paradox is this: Consider the set $R$, which is the set of all sets which are not in themselves.

Yeah, that’s weird. Maybe an easier one to get your head around is “the set of all sets.” Since it’s a set of all sets, and it is a set, the set of all sets has to contain itself.

The set $R$, the set of all sets which are not in themselves, is even weirder. But (naively) $R$ is a set because we can define it.

Is $R$ in itself?

If $R$ is in $R$, then $R$ is a set which contains itself. But that means (since $R$ is the set of all sets which are not in themselves) that $R$ can’t be in $R$.

Okay, so maybe $R$ is not in $R$. If it isn’t, though, the definition of $R$ (again, the set of all sets which are not in themselves) means that $R$ must be in $R$!

In other words, $R$ can’t be in $R$, but that means it must be in $R$, but that means it can’t be in $R$, but that means it must be in $R$, but that means…

This is a lot like the infamous statement, “This statement is false.”5

In order to clear up Russell’s paradox, along with a family of other paradoxes that come along with a more naive approach to set theory, new axioms were needed.

Over the next few decades, the now-standard Zermelo-Fraenkel axioms of set theory were developed. These axioms are designed to allow you to do most things you think you should be able to do with sets, like combine them and compare them and such, but they avoid paradoxes that can creep in if you try to allow everything.

To conclude, axioms and definitions are invented for many reasons, ranging from an attempt to make precise an intuitive idea to an attempt to remove paradoxes.

But math works, as long as we pick reasonable axioms, and we can use it to learn everything that must be.

Right?

Actually, it’s not quite so simple. There are fundamental limits on what we can use mathematics to understand. The only other option is that math is self-contradictory.

That is the content of Gödel’s incompleteness theorems. And that’s what we’ll talk about next time.

Sorry for the delay for this post. We’ve been writing posts weekly for six months. Unfortunately, that turns out to be an unsustainable pace with all the other things we have to get done. We’ll continue to post, but it will be less often than before. Feel free to subscribe to get an email when we post!

1. And this is not the most confusing definition. Another definition (often, but not always equivalent) is “The inverse image of an open set is open.” I don’t want to define those terms here, but this definition, though very useful, is even less intuitive than Bolzano’s.
2. My impression is that this is how “Hilbert spaces” got their name. Hilbert spaces are infinite dimensional vector spaces, with a way to measure lengths of vectors, and angles between them. That is all very natural. But Hilbert spaces have the additional property that they are “complete,” essentially meaning that there are no “vectors missing” from the space. This condition is very important in being able to prove anything about infinite dimensional vector spaces. Hilbert had a number of papers about these complete vector spaces, and others found them useful, and so started calling them Hilbert spaces.
3. There are four axioms of a group. 1. Multiplication of things in the group have to stay in the group. 2. Multiplication is associative, i.e., $(a\cdot b)\cdot c = a\cdot(b\cdot c)$. 3. There is a “1,” meaning anything you multiply by “1” stays the same. 4. Everything has an inverse, so that if you multiply them together, you get “1.” There are lots of examples of groups, such as the positive numbers, or invertible matrices. But there are less obvious examples, like the set of all rotations in space, $SO(3)$. These are a group since, if you do one rotation, then another, that is the same as doing one big rotation. (Doing one rotation, then another, i.e., composition, is the “multiplication.”) The “1” rotation is the rotation of zero degrees, i.e., doing nothing. And the inverse is undoing the rotation you just did.
4. I don’t say “wrote down axioms” because he never actually wrote down precise axioms for his set theory.
5. Perhaps even more accurately, the set $R$ is like “the smallest number that can’t be described in less than 13 words.” It seems to make sense, but, looking closer, there’s obviously some sort of problem with this number.