But what about fractals?
The boundary of the Koch snowflake (partly shown above) should probably not be two dimensional since it’s just a line that’s been crinkled infinitely many times. But it also probably shouldn’t be considered one dimensional either — it’s been crinkled so many times, it’s infinite length and almost takes up area.
So, how many dimensions does it have?
Close, but you’ll spoil it!
First, let’s talk about dimension.
There are many ways to define dimension, from topological dimension to the Hausdorff dimension. Different definitions are useful for different fields of mathematics, but they all agree for simple shapes.
The simplest definition of dimension is the one we used when we were talking about manifolds back in Asteroids on a Donut. Basically, a line is one dimensional because you can only go in one (pair of) directions — left and right. A square is two dimensional because you can go left and right and up and down. And so on.
Unfortunately, that stops working so well when you start dealing with fractals. After all, which directions can you go on the Cantor set?
The Cantor is set is just a bunch of disconnected points. There isn’t a direction to go when you’re on them. But there’s a whole bunch of points (uncountably^{1} many, in fact), so maybe it doesn’t make sense to say it’s zero dimensional.
To make sense of things like the Cantor set and the Koch snowflake, we need a definition of dimension that’s a bit more robust.
To figure out how to define dimension, let’s look how size grows when we double lengths.
For a line, if we double the lengths, the line doubles in length.
In other words, doubling the lengths makes the line times as big.
For a square, if we double the lengths, the square quadruples in area. That’s because there are two directions for doubling to affect, so doubling the lengths makes the square times as big.
For a cube, if we double the lengths, the cube octuples in volume. For a cube, there are three directions for doubling to affect, so doubling the lengths makes the cube times as big.
See where I’m going?
One way to figure out a dimension is to make it bigger by multiplying each direction by two. Just like multiplying the lengths of a cube by 2 increased its size by a factor of , if we multiply the lengths of a dimensional shape by 2, then the size of the shape will have to expand by a factor of .
Multiplying by 2 isn’t special, of course. If we multiplied each length by a factor of 3, the size of a shape of dimension would have to expand by a factor of .
Now, let’s take this new measuring stick and try to measure some fractals!
Even Benoit Mandelbrot, often considered the father of fractals, had trouble nailing down an exact definition for what a fractal is. But a good intuitive idea is that a fractal is a shape that looks roughly the same, no matter how much you zoom in.
For example, if you zoom in on a circle, it quickly stops looking round, and begins to look straight.
But, for a fractal, no matter how far you zoom in, it keeps on repeating itself.
While “most” fractals aren’t exactly self similar (for instance, the Mandelbrot set), many of the simplest examples are. To make one, start from a simple shape, then repeatedly change it in the same way, on smaller and smaller scales, infinitely many times.
For instance, the Koch snowflake.
To make the snowflake, start with a triangle, then, add a spike to each side. Then, for each of the new, smaller sides, add a new spike. The Koch snowflake is not any of the intermediate steps, but the limit of doing infinitely many steps.
So, what’s the dimension of the snowflake?
Remember, what we’re looking for is a scaling law — if we multiply each length by a factor of 3, the size of a shape of dimension would have to expand by a factor of .
How does the snowflake scale?
Consider one side at a time. Each side becomes four sides, of one third the length.
If you take one of these little mini-sides, and triple the lengths in each direction, then each one of these mini-sides will become as big as the entire original side. For instance, the little spike on the left flat part expands to be the same size as the big spike in the middle of the original.
Since the original side was four copies of the mini-side, that means that if we triple the length in each direction, then we quadrupled the total size of the shape.
Using the pattern that a dimensional shape should get times bigger, we have that . Using a logarithm, we get that the dimension of the Koch snowflake is . More than a length, a bit less than an area.
Let’s do another example of a fractal, this one somewhere between a point and a line — the Cantor set.
To make the Cantor set, you start with a line. Then you cut out the middle third. Then, from each remaining piece, you cut out it’s middle third, and so on, until you’re left with a fine dust.
To figure out the dimension of the Cantor set, look at the left segment after the first excision. Since we are just cutting out middle thirds, this little segment becomes an exact copy of the entire Cantor set, just at a smaller scale.
If we triple the lengths, then that small segment becomes the original Cantor set. But the original Cantor set is just two copies of that smaller piece. Thus, tripling lengths doubles size, so , and . Not really a line, not really a point.
With any of these self similar fractals, you can do a similar trick, without too much problem. For instance, the Serpinski Carpet, which is made by taking a square and repeatedly cutting out the middle ninth, increases in size by a factor of 8 whenever the lengths are multiplied by 3, and so has a dimension of , so .
Many of the most awesome fractals aren’t exactly self similar, like the Mandelbrot set.
The Mandelbrot set is defined by looking at each complex number individually, then repeatedly calculating . (For example, if , then we make a sequence , , , etc..) If this sequence becomes infinitely big, then the original is not in the Mandelbrot set. If, like for , this sequence stays close to zero, then that is in the Mandelbrot set.
The Mandelbrot set is famous for it’s beauty, and for the nearly repeating, but infinitely varied patterns you find when you zoom in.^{2}
Actually, from that definition, you might realize that only the black parts of the pictures are the Mandelbrot set. The pretty colors come from counting how long it takes for the sequence to get large enough to be sure it won’t stay small.
Well, the Mandelbrot set itself is, of course, 2-dimensional, since it has area. (It can’t be three dimensional since it’s already confined to a plane.) But what about its boundary?
It’s a really zig-zagging line, like the Koch snowflake, but it’s not exactly self-repeating, so we can’t do the same tricks we did before. Just from the previous examples, you probably expect the dimension to be somewhere between 1 and 2, which seems reasonable.
Mandelbrot himself conjectured that the boundary was so zig-zaggy, so fractal, that it would somehow skip pass crazy dimensions like , and go all the way to two dimensions.
This turns out to be true. Shishikura managed to prove in 1998 that the boundary of the Mandelbrot set is two-dimensional. The proof is a biiiit complicated, so we won’t go into it here, but it does work.
Normally, the boundary is one dimension lower than the main part of a shape. For instance, a square is two dimensional, but its boundary is one dimensional. Fractals can be a bit different. For instance, we had the Koch snowflake. The inside is two dimensional, but the boundary, as we said earlier, is dimensional. Still less than two.
The Mandelbrot set itself is also two-dimensional. But, somehow, the boundary is so jagged that it manages to have the same dimension as the set itself.
That’s… bizarre.
But that’s how fractals roll.
Sorry for taking so long on this post. I wrote an entire other post about waves, which took forever… but then I decided it wasn’t very good. Hopefully I’ll get a better angle on that topic eventually.
<– Previous Post: The most controversial axiom of all time
–> Next Post:
The accountants and engineers may be a bit angry about magically doubling a sphere…
but the proof that you can double a sphere does almost nothing questionable.
In fact, the most questionable thing we have to do is… to choose.
Yup! It turns out that making choices is more controversial than it seems it should be.^{2} In fact, the Axiom of Choice is perhaps the most discussed and most controversial axiom in all of mathematics.^{3}
To convince you that choosing is hard, let’s look at simple example, picking a number between 0 and 1. Go ahead, pick one!
Like the girl in the red dress, you probably picked a rational number, i.e. a fraction. There’s nothing wrong with that, but, remember, that there really aren’t that many rational numbers.^{4} So, let’s try to pick a random irrational number between 0 and 1.
There are lots of choices possible, like or or , but that’s not really a random irrational number. They’re all very special ones that we can write down using a fancy formula, rather than a completely random choice.
So, how could we choose a random number?
Recall that an irrational number can be thought of as a infinite decimal, that neither repeats nor ends. So, to pick an irrational number at random, we could just pick digits randomly, one at a time.
Great! Now, you’ve picked a truly random irrational number!
But tell me, what number did you pick?
See the problem?
Choosing one digit, or even a million, is (in theory) not very hard. There are digits, you pick one. No problem.
But if you have to make an infinite number of choices… Well, it’s easy to say that you should make infinitely many choices, but can you really do it? If you can’t tell me the number you picked, did you really pick a number?
That is the controversy about the axiom of choice.
So, what does the axiom of choice actually say?
The axiom of choice says that, for any collection of (nonempty) sets, you can choose one thing out of each set.
For instance, if we were picking an infinite decimal, like before, our collection of sets would be a bunch of copies of the set of digits 0 to 9, one set of them for each of the infinitely many digits we need to pick. The axiom of choice says that we can pick one digit from each set of digits in order to pick an infinite decimal number. It doesn’t say how to pick those digits, or what digits you pick, just that you can pick them, somehow.
(To be clear, the axiom of choice doesn’t talk about making random choices, just a choice at all. So, in the exact case of picking digits that we just used, the axiom of choice simply says that there is some infinite decimal we can pick, not that it’s a random one. It’s perfectly valid for the axiom of choice to choose, say, all zero’s, and end up with the number 0.)
So why is this axiom so controversial?
The first is that you can’t actually get your hands on the object(s) the axiom of choice chose.
Axioms usually represent a basic definition, or a base truth, or something that is “obviously” true. For instance, one of the other basic axioms (of set theory) is that no matter which (counting) number you pick, there’s always a bigger one.^{5} That seems pretty obvious.
But, with the axiom of choice… Well, just like you couldn’t tell me which number you picked by picking each digit randomly, the axiom of choice simply says you can make a choice, not which one to make, or what the choice is.
If you can’t tell me what number you picked, did you pick it?
How is it “obvious” that you can make such a choice?
This is the argument of the constructivists. In their view, everything needs to be explicit. A choice only makes sense if you can tell me what you picked, or, at least, a way to make a unique choice. The axiom of choice fails this standard, and so should be avoided.
The other objection is that the axiom of choice leads to a number of “obviously false” results.
The most famous of these, we’ve already talked about, the Banach-Tarski paradox. In short, it says that you can take a sphere, cut it into a few pieces, move them around, and rearrange them into two spheres of the exact same size as the original! A bit of black magic, indeed.
The problem is that the axiom of choice is also instrumental in proving key, foundational, “obvious” results as well!
For instance, nothing is more obvious than if you have two bags of rice, one has more grains of rice than the other, or maybe the same amount of rice.
But without the axiom of choice, you can’t say the same thing about sets!
For finite sets, of course, this is not a problem. A set with 42 things in it is bigger than one with 27 things. But for infinite sets, it’s not always clear how to compare them.
Like we talked about way back in The size of infinity, the way to compare sets is line up the things inside them with each other. If we had two sets, say A and B, and each thing in A had a corresponding thing in B, then clearly B is at least as big as A.
The problem is that you can come up with complicated sets A and B where it’s not obvious how to line up things in A with things in B. In fact, without the axiom of choice, you can show it’s sometimes impossible to compare the size of the two sets. And it’s not even that you just don’t know which is bigger. It’s worse than that. The sets both have sizes, but you can’t even compare their sizes.
It turns out that the axiom of choice is equivalent to saying that you can always compare sizes of sets. In other words, either you accept the axiom of choice, or else you can’t always compare sizes. You can’t get one without the either.
There are a lot of other theorems that are equivalent to the axiom of choice. There’s a whole section of the Wikipedia page listing some of the equivalent results, some more intuitive, some less.
To quote Jerry Bona, “The Axiom of Choice is obviously true; the Well Ordering Principle is obviously false; and who can tell about Zorn’s Lemma?” The joke is that all of them are actually equivalent.
So, all of this leads to two very important questions.
First, you can’t “disprove” an axiom, since they’re base assumptions. But, can you prove that the axiom of choice is not consistent?
A consistent set of axioms is a set of assumptions that can’t prove contradictions. For instance, if you could use your axioms to prove that 0=1, that would mean your axioms were not consistent.
If you could show the axiom of choice caused inconsistencies, all the accountants in the world would feel more relieved, since then we could throw out the axiom of choice, along with its impossible consequences, like the Banach-Tarski paradox.
However, Gödel again comes to the rescue. In 1940, Gödel showed that axiom of choice does not itself cause any inconsistencies.^{6}
Okay, so we can’t throw out the axiom of choice because of inconsistencies, no matter how much the Bananch-Tarski paradox assaults our sensibilities.
But maybe we can do the opposite. The second question about the axiom of choice is whether we can prove it true using only the other axioms. In other words, do we need to assume the axiom of choice at all, or do we get it for free?
Here, again, we get an interesting answer. In 1963, Cohen showed that it’s impossible to prove the axiom of choice from the other standard axioms.
So, where does that leave us, intrepid explorers of mathematics?
As a (non-obvious) consequence, Cohen’s proof means we are free to either assume that the axiom of choice is true, or, in fact, that the axiom of choice is false! Either way is fine for math.
How do mathematicians deal with the controversy?
Originally, mathematicians were resistant to the axiom of choice. One well known story is about Tarski (of Banach-Tarski fame). He used the axiom of choice to prove a result about the sizes of infinite sets.^{7} He submitted the paper to a journal. In response, two editors rejected his paper.
Their argument? Well, Fréchet wrote that using one well-known truth to prove another well-known truth is not a new result. Meanwhile, Lebesgue wrote that using one false statement to prove another is of no interest.^{8}
Nowadays, however, most mathematicians accept the axiom of choice without too many reservations. It’s simply too useful in proving too many foundational results in many fields. It’s consistent, so there doesn’t seem to be any reason to not use it, despite the occasional paradox it causes.^{9}
<– Previous Post: Double for Nothing, part 2
First post in this series: How Long is Infinity?
Thanks for sticking with me! Those of you who came from the recent video by 3Blue1Brown may not have realized, but I haven’t posted recently. I’d planned on being a professor since high school, but a few months ago, I decided I was going to change careers. Learning as much computer science as I could and searching for a job and moving and so on took a lot of time and mental effort, which lead to not many (any?) blog posts.
However, we’ve now settled down in Albuquerque, NM, where I just started a job as a software developer for a small company making scientific software. New posts should now continue to come out about once a month. Yay! More awesome math!
(Also, if you haven’t checked out 3Blue1Brown before, you totally should. He’s pretty awesome too.)
I’ll explain why it’s been so long at the very end of the post, but in the meantime, we’ve got some math to explore!
The Banach-Tarski paradox says that you can take a ball, cut it up into a handful of pieces, then rearrange them in order to get two balls identical to the original.
Impossible, right?
Wrong!^{2}
It certainly seems impossible, though. After all, if all you do is cut up the ball, and move the pieces around (no stretching required!) then the volume of stuff from the ball shouldn’t change.
But you can duplicate a ball! The trick is that rotations can create points, seemingly out of nowhere.
Here’s a simple example: Take a circle, with a single point missing.
How can we fill back in that hole? The obvious thing to do is just infinitesimally stretch the circle to fill in that single-point gap.
But let’s rule out stretching — we know that stretching things mathematically can create points. Can we do it with just a rotation?
If we pick just the right set of points to rotate, we can!
Let’s start by taking the point one radian^{3} clockwise of the original point. If we rotate it counterclockwise around the circle, it’ll fill the gap we had.
So, of course, that point should be in the set we will rotate. Of course, moving that point leaves another gap, so we need to also rotate the point one radian clockwise of that. And then we need another point to fill in that gap…
…and so on, and so forth.
The trick is that we picked a special angle. Recall that a circle has 360 degrees, or, equivalently, radians. If we keep picking points one radian clockwise of the original gap, we go around the circle once, then twice, then more, but we will never end up back where we started. That is because is an irrational number! (Details in this footnote:^{4})
The original hole is filled by the point that was 1 radian away. The hole left at 1 radian away is filled by the point two radians away. That hole at 2 radians is filled by the point at 3 radians… and so on, so a billion radians later, the hole left by the point a billion radians away is filled in by the point a billion and one radians away.
Sure, by this point, we’ve wrapped around the circle, oooohh, say, 160 million times, but we have never repeated a point! All thanks to being irrational.
So, why isn’t there a hole at the end of all this? Well, there is no end. We’re kind of pulling a point out of infinity to fill the gap.^{5}
Of course, creating a single point is not so impressive. So, let’s get back to the Banach-Tarski paradox.
As we talked about in the last post, the key trick is not really about geometry at all. I’m going to review some of what we discussed last time, but if you haven’t read or don’t remember Double for Nothing: the Banach-Tarski Paradox, you probably should before finishing this post.
If we take a ball, we can rotate it in different directions, forward (F), backward (B), right (R), and left (L). And, we could do multiple rotations in a row, for example, FRB would be backwards rotation, right rotation, then forward rotation.^{6}
We can put all of these “words” representing rotations into a graph, where, for each letter in the word, F means you go up, L left, etc.. The center point, which represents no rotations, we can label N.
Thus, a series of rotations is represented by a word which is represented by a point on this branched graph. Of course, we can do any length of words (i.e., any number of rotations), so we get an infinite graph.
The “words” starting with L represent the rotations ending with a left rotation, and are the ones on the left side of the graph. (Again, words are series of rotations are points on this graph.)
The key observation from last time was that if we take the “words” starting with L, then undo the last rotation by rotating right, we end up with all the words except the ones on the right!
Where do all these extra points come from? Well, like the circle example from earlier, in some sense we’re “pulling them from infinity,” i.e., pulling them out of those infinitely small branches down in the graph.
The key of the Banach-Tarski paradox is figuring out how to get this “creation” of points on the graph to work on a ball instead.
To do that, we need to associate points in the ball with this graph somehow. Fortunately, the basic idea is not too hard. The points on the graph are supposed to represent words which represent series of rotations of a ball. Thus, we’ll try to associate each point on the graph, i.e., a word, with points on the sphere that we find via those rotations.
Grab the ball. The word that represented no rotations, N, we’ll associate with the “north pole” of the ball. The north pole, though, is just a point on the surface, and we want to duplicate the entire ball. So, let N actually represent all of the points below the north pole through the inside of the ball, all the way down to the core. (Though not including the center point at the core itself.) Thus, N represents a little line segment.
For every other series of rotations, after you rotate, the word (i.e., point on the graph) representing those rotations will represent the line from the new north pole to the center of the ball. For example, for the rotation L, you would rotate left, and that new north pole and the points under it are now “L.”
To make this all work, it’s very important that two different words, i.e. series of rotations, don’t represent the same set of points. To guarantee that, we need to pick the angle we rotate carefully. Fortunately, like in the circle rotation example earlier, it’s not too hard. One traditional angle is , but there are infinitely many angles that would work.^{7} If we pick that angle, each word, or series of rotations, will rotate a new point to the north pole.
Great! We’ve taken the ball and identified its points with the words, which are points on the branched graph. And those points are spread out all over the ball — it turns out you can get points spread out evenly all over the ball with an arbitrary number of rotations.
Except…
Well, we’ve actually missed almost all the points of the ball!
Even though our words represent points that are evenly spread out all of the ball, so that it would look like we’ve covered everything, we’re actually missing “most” of the points in the ball!^{8}
Fortunately, there’s an easy way to fix this.
Simply pick one of the points we missed on our first go, and start again with that point as the new north pole. We can associate the word N (for no rotations) with both this north pole and the original one we picked. Then, we can do all the rotations, like before, and associate their words with the new points as well. After doing this, we’ll have two line segments of points from the surface to the middle of the ball for each “word” like N or FR or BBBR, but we’ll lump them together into one set of points.
Unfortunately, we’re still missing most of the points in the ball. So, we pick yet another new north pole from the leftover points, and do it again. And again, and again. In fact, we have to do it infinitely many times.^{9}
But, in any case, we’ve split up the ball into a bunch of pieces. For instance, N is associated with all the infinitely many “north poles” we picked, along with the line segment underneath them, and a similar set of line segments for every other word, or set of rotations.
Now we can use the trick from the last post.
Let’s call the set the set of all the points and line segments in the ball represented by words that start with L, i.e., where the last rotation was to the left. We can similarly define , , and . The set will just represent all the points associated with the north poles.
The set is all the points found by rotating the sphere, where the last rotation was to the left. So, if we take all those points, and rotate them to the right to undo that last rotation, as with the graph, we get all the points in , , , and , exactly like we did for the branched graph! As before, we call the left-last points, rotated right, .
We can do a similar thing, and , the points found by rotating backwards last, but then having that last rotation undone, is all the points in , , , and put together!
That, right there, is the heart of the Banach-Tarski paradox.
It’s easy to get the two balls from what we’ve done. To make the first ball, take all the points in and , rotate to the right to get , then put and together, and you have a ball! The second ball is similar: take all the points in and , rotate forward to get , then put those two sets together, and you have the second ball!
So, there you have it. By cutting up the sphere into a few pieces, and then just rotating and moving them around, you can turn one ball into two!
Admittedly, we’ve glossed over a few important details if you want this to work out perfectly, but I think they can be hidden in a footnote.^{4}
This is quite the paradox! You shouldn’t be able to cut a ball into pieces and put them back into two spheres!
From a physical point of view, this process is, of course, impossible. Not only are the sets we’re cutting the ball into hopelessly complicated and delicate, but they assume that matter is infinitely divisible, which is false. (Subatomic particles are, after all, a particular size, and it’s hard to cut a quark into pieces…)
But even from a mathematical view point, this seems like it shouldn’t work. And, so, if we think that way, we can look back at our assumptions, and see which of the axioms we used seems the most questionable, and try to get rid of that assumption.
What’s fascinating about this proof is that the key problematic axiom is so innocuous that, if you didn’t know what you were looking for, you would probably never find it. The step that is the most questionable is the one where we choose points as new north poles.
The thing is, when we make that choice, there’s no reason to pick one point over another. They’re all just as good as any other. Plus, we have to make infinitely many of these choices, which is also a bit… uncomfortable.
Doing this requires the Axiom of Choice, perhaps the most infamous axiom in mathematics.
And all it says is that you can choose things.
In the next post, we’ll take a look at the axiom of choice and why it’s so important… and infamous. (Assuming the Missus can find the time to draw…)
An excellent video on this paradox, and its proof, can be found on Vsauce’s Youtube channel. In the description, he also lists many resources which I found useful in preparing this post.
Oh, life plans. How fleeting they are.
Since the last post, we’ve had Thanksgiving, a funeral with associated trip, finals, the flu, bad colds, Christmas, the start of a new term, 10 interviews, and so forth. It’s been… busy. That would explain some of it. But another big thing is the complete upheaval of my life plans!
See, to become a professor, after you get a PhD, you usually spend 2 or 3 years at one or two universities as a “postdoc,” which is what I am now. These positions are temporary, and are not expected to lead to permanent positions at said universities.
So, I applied to permanent jobs this school year. Lots of them. I had a bunch of interviews, but I didn’t end up getting hired as a professor.
I could probably scramble and get another postdoc at another university and then do another cycle or two of applications for professor jobs, but… well, academia is stressful. Awesome, to be sure, but stressful too. (When I talked to my mentor about his career path, his frequent use of the phrase “panic mode” didn’t exactly encourage me.)
So, after a lot of thought, I’ve decided to leave academia, and become a computer programmer instead. Of course, since I have limited experience, that means I have till my contract at the university ends to learn enough programming to get hired somewhere. It turns out that covering years of computer science education on my own is time-consuming.
For obvious reasons, then, this blog, though it will probably continue, is going to be updated less frequently in the coming year.
But this is mathematics we’re talking about. These results may assault the intuition, but it’s not because they’re wrong or the result of faulty logic, like so many other paradoxes. We come with proofs and irrefutable logic. Sure, you might be able to argue against the assumptions (axioms) we start with, but given those perfectly reasonable and seemingly innocent rules, we can create truly bizarre things.
No matter how much those accountants dislike it.
And today, we get to talk about a doozy of a paradox. We’re going to take a ball, cut it up into 5 or so pieces, move them apart, rotate them around a bit, and end up with two balls, exactly identical to the original.
This is the infamous Banach-Tarski paradox.
Of course, creating something from nothing is an old trick. For instance, if you take all the numbers between 0 and 1, which has length 1, and you multiply each of those numbers by 2, you end up with all the numbers between 0 and 2, which has length 2! Pretty much everyone’s okay with this one.
But the Banach-Tarski paradox is much weirder. We turn one ball into two, but there is no stretching involved. The only things we do is cut the ball into pieces, move them apart, rotate the pieces a bit, then move them back together. And just moving parts of a ball shouldn’t create a new ball out of thin air.
Right?
But if we do it just right, we can.
So, how do we start this black magic?
While this paradox seems to be about geometry and measuring things, the key step, the key observation, has nothing to do with either. It’s all about a dictionary with every possible word in it.
For simplicity (and to line up nicely with what we’ll do later), let’s pretend our alphabet only has 4 letters — B, F, L, and R. So, the first word in our dictionary would be B, followed by BB, then BBB, then BBBB, then BBBBB, then…
And, finally, after all those exciting words, we’d finally get to BF, then BFB, then BFBB, then…
Of course, after you finish all the B words, you get to start the F words! F, FB, FBB, … FFB, FFBB, … FL, FLB, FLBB, …
I think you get where this is going.
Now, we have all of these infinitely many words, every word that could every be thought of or written. How are we going to print this dictionary?
Well, the first volume of our dictionary could contain all the “B” words. Of course, if everyone knows it’s the “B” volume, we could save space by not printing the first letter of each word. That means the first word would be ___, then B, then BB, then BBB, then …, then F, then FB, then FBB, then …
But wait a second. Those words F, FB, FBB, etc. which represent BF, BFB, BFBB, etc. are actually all the words that begin with F!
In fact, since every word could be extended to a “B” word by just adding a B as the first letter, by taking off that first B, our “B” volume is really a dictionary of all the possible words!^{1} One volume of our dictionary can take the place of four!
See what we did there?
No? Well, let’s look at it in a more graphical way, and even closer to what we’ll actually do for the ball.
Grab a ball.
Instead of letters in a word, we can think of B, F, L, and R as being directions to rotate the ball a few degrees. “B” means to rotate the ball a bit backwards, towards yourself, “F” means to rotate the ball a bit forwards, away from yourself, “L” a bit left, and “R” a bit right. Then, all the one letter “words” represent rotating the ball exactly once, and we could graphically represent them like this:
The center intersection, which we’ve labeled “N” is not rotating at all. In our dictionary it was the ____ “word.”
Two letter “words” would represent rotating twice. For words, though, we could have something like “BF” or “FB,” but when thinking about rotations, we don’t want to undo the rotation we just did, so we’re not going to allow rotations that undo the one we just had. In fact, we can think of them cancelling out, so that “BF” is really the same as “N.”
But the remaining ones, like “FL” and “RF” are fine. We can add those to our graph like this:
We make each additional step half the size so that we can keep all these words straight, and when we say “FL,” we should think of that word representing doing a left rotation, then a forward rotation. Thus, all the series of rotations that end with a left rotation are on the left hand side of our graph.
After the two letter words, we get the 3 letter words (3 rotations), then the 4 letter words (4 rotations), etc.. This graph contains all possible order of rotations we could have made! Each intersection on the graph represents a different order of rotations.
Cool!
So, how can we create something out of nothing?
Look closely at, say, the left part of the graph.
All of the points in this part of the graph are represented by words starting with L, meaning the last rotation was to the left. But, like with the dictionary, by removing the first letter (the last rotation), we can have other words appear!
For words, it made sense to just remove the first letter. But since now we’re doing series of rotations, we can remove the first L instead by taking the ball and rotating right to undo that last left rotation!
If we do that, for instance, L becomes RL, which cancel out and give us the center point, N. The rotation LF becomes RLF, but the R and L cancel out and this becomes just F. The rotation LBR becomes RLBR, which is really just BR.
If you keep track of each set of rotations, after undoing that last L, the set of rotations on the left (which was about 1/4 of the graph) becomes the entire top, left, and bottom parts of the graph (about 3/4 of the graph!)
The only points we don’t get are the ones on the right, but that’s because those words start with R, and we couldn’t have had “LR…” words in the left part of the graph, since we didn’t allow rotations that would cancel each other out like that.
So, by undoing a rotation, we seemingly create points out of thin air! This is the key trick that will let us duplicate a ball.
But for right now, let’s use it just on this graph. I’m going to cut the graph up into 5 pieces, then, using this “undoing a rotation” trick, make two complete copies of the graph, with one piece left over!
To do this, we’ll call to be the set of all rotations, where the last rotation was to the left. This is the set we were playing with earlier. We’ll define , , and similarly. The fifth and final set is the odd one out, just the center point N.
To undo the final left rotation of , we can rotate to the right, which we could write as . As we’ve already said, is already the entire graph, except for the right part, . But that was one of the pieces we have lying around. So, the first copy of the graph is plus .
To make the second copy, we can take all the rotations that end with a back rotation, , and then undo it with a forward rotation. Then, like before, plus make a second copy of the graph.
Two graphs for the price of one!
This trick is the true heart of the Banach-Tarski paradox. By using rotations to split up the ball in a very careful way, we can create points out of nothing by undoing the last rotation. And then we can very carefully put them back together, creating two balls out of one!
But let’s leave those details till next time!
<– Previous Post: Non-measurable Sets That Go Bump in the Night
First post in this series: How Long is Infinity?
–> Next Post: Double for Nothing, part 2
That’s right, we’re going to talk about sets of numbers so weird that even the very idea of length breaks when looking at them.
In the last few posts, we’ve been talking about how to measure the lengths of sets, even ones that are weird.
In How Long is Infinity?, we introduced how we measure things — by covering up a set with little intervals, and then calling the length of our set the smallest lengths of intervals that cover our set.
Using this length, it turns out that any countable set, like , or even the set of all rational numbers, has zero length.
In The Cantor Set, we showed that, though uncountable sets like [0,1] (all the numbers between 0 and 1) have positive length, there are uncountable sets with zero length. The Cantor set is the main example.
In this post, we want to look at non-measureable sets. Sets which are so weird that they break our “ruler” and make it impossible to make any sense at all of their length.
As a technical caveat, the “ruler” we’ve been discussing so far is technically the “outer Lebesgue measure,” which is not really the same as the “Lebesgue measure” that mathematicians use. However, the difference is buried in technical details that would distract from the story, so we’ll bury those important details for this post.
So what does an non-measureable set look like?
It’s gotta be weird. The definition of length, or measure, that we have is pretty robust. It can handle some pretty weird sets, like all the decimals with a 7 in them, and spit out a length.^{1}
So, in order to come up with a non-measureable set, we’re going to have to work hard.
What we’re going to do is come up with a weird set, and we’ll prove that if we add up infinitely many copies of it, somehow that total length will end up between 1 and 3. But that can’t be right, since adding up infinitely many of the same number always gives either 0 or infinity!^{2}
To start, we’re going to split all the real numbers into groups.
The first group is all the rational numbers, i.e., any number that can be written as a fraction of integers, like 2/3 or -712/2341. We’ll call this set , for “quotient.”
The other groups are all copies of the rational numbers, but shifted left or right by a different real number. For instance, we could have the group , which is the set of all numbers which are plus any rational number you want. Or you could have , which is the set of all numbers which are plus any rational number you want.
There are a whole lot of these groups, and every real number is in one of them. On the other hand, there is more than one way to name which set you’re talking about.
When we gave the two examples of these groups, we used and to define them. In other words, we picked a particular number, or , that happened to be in the group, and used it as a representative of that set.
But there’s more than one number in , and we could have used any of them to represent the set, not just . For instance, is the exact same set, with the exact same numbers in it. So is or …
…though would not be the same, since and do not differ by a rational number.
As long as our representatives differ from each other by a rational number, the sets are exactly the same.
Using these groups of numbers, we can now construct the unmeasureable set.
Let be the unmeasurable set. To construct it, look at each of the groups of numbers we came up with earlier. From each one, pick a single representative that happens to be between 0 and 1. For instance, from the set , we could pick the representative , which is between 0 and 1. From the set , we could pick 0 or 1 or 1/2 or any rational number between 0 and 1.
Now that we’ve chosen one representative from each group, it turns out that is unmeasurable!
Here’s how we’ll prove it.
Similar to how we took and moved all the numbers by to make , we can take our set and move all the numbers by a rational number , and make a new set .
To make things clearer, this means that if a number is in , then the number is in . And, in the opposite direction, if a number is in , that means that must have been in .
But there’s something funny about . No matter how small is, and never overlap!
Remember, each of the groups we came up with earlier had infinitely many different representatives we could have picked. But the representatives had to differ from each other by a rational number if they were supposed to represent the same group.
If and overlap, that means there would be a number in and . That means that would also have to be a number in . Thus there are two numbers ( and ) that are both in , but differ by a rational number.
But remember that the numbers in are representatives of our groups, and so if they differ by a rational number, they represented the same group.
But we only picked one representative from each group to put in .
And so and can’t overlap!
Next step: Put the rational numbers between -1 and 1 into some order. There are infinitely many of them, but they’re still a countable set, so we can do it. There are more details in the earlier post The size of infinity, but here’s the kind of ordering we could use to make sure we get them all.
Since we have an ordering for the rational numbers between -1 and 1, we’ll call the “1st” rational number, the “2nd,” etc., and the “k-th” rational number. Then, we can come up with a whole bunch of copies of moved around. We’ll call the the set , i.e., the set moved up or down by .
Just as before, none of the overlap. Also, since only had numbers between 0 and 1, and is between -1 and 1, then all the numbers in are between -1 and 2.
The more difficult part is to recognize that if you put all of the together (“take their union”), then together, they contain every number between 0 and 1.
To see this, pick any number between 0 and 1. No matter which number we pick, it’s in one of the groups we made earlier, perhaps . But, when we made , we picked one representative (between 0 and 1) for each of these groups. Since the representative and the number are in the same group, they have to differ by a rational number, and since they are both between 0 and 1, that rational number they differ by has to be between -1 and 1. That means that is in the set that happens to be moved by , which is a rational number!
Yeah, it’s kind of hard to keep all these sets straight, but we’re almost done.
To finally see that the set can’t be measured (i.e., is non-measurable), let’s pretend that we can measure it, and show that something impossible happens.
If we can measure , the sets have the same length, since they’re really the same set, just moved up or down on the number line.
Since, put together, the contain all the numbers between 0 and 1, their total length has to be at least 1. And, since the only have numbers between -1 and 2, clearly their total length has to be no bigger than 3. If we wrote to represent the length of , we could write that like this:
But, again, the set has the same length as each of the , and so, really, we’re saying:
But we’re adding up infinitely many of the same number! If the length were 0, adding up infinitely many zeros gives zero length. If the length were any number bigger than zero, adding up infinitely many of them would give infinite length!^{3}
And so, since 0 and are not between 1 and 3^{4}, we have shown something impossible. Thus cannot be measurable. We have broken our ruler.
So, yeah, non-measurable sets are weird. And we had to do a lot of work to come up with one.
But, in the end, it might seem like a waste of effort. I mean, it’s just a weird set that no one in their right mind would care about anyway.^{5}
But there are some weird things you can do with non-measurable sets.
The most famous is the Banach-Tarski paradox. There is a way you can take a sphere, cut it up into a few pieces, and rearrange them, and end up with two spheres, exactly identical to the original.
But that’s for next time!
Happy Halloween!
<– Previous Post: The Cantor Set
First post in this series: How Long is Infinity?
–> Next Post: Double for Nothing: the Banach-Tarski Paradox
Using this definition, we proved the surprising result that any countable set has length zero.^{1}
For the counting numbers this isn’t so weird. After all, they’re nice and spread out, even if there are infinitely many of them.
But I want to emphasize again how weird it is that countable sets have zero length. For example, consider all the rational numbers (numbers that are fractions, like 2/3) between 0 and 1. They’re packed in there like sardines.
Now, if we want to estimate their length, we could count them, putting an interval of length 1/4 on the first rational number, an interval of length 1/8 on the second, then 1/16, then 1/32, etc..
In this way we could cover all of those rational numbers by intervals with total length . Somehow, despite the rational numbers being everywhere inbetween 0 and 1, and us covering every single one with its own interval, somehow we only managed to cover half the numbers between 0 and 1!
Yeah, it’s pretty bizarre.
So, countably infinite sets don’t have any length, while many uncountably infinite sets, like the interval [0,1], all the numbers between 0 and 1, usually have positive length.
Does every uncountably infinite set have length?
Now seems a good point to mention that adding up uncountably many things behaves… badly.
When we had a countable set, you could argue that the total length had to be zero because each individual point has length zero, and must clearly be zero, so the total length had to be zero.
But for an uncountable set, adding up lengths just doesn’t work.^{2}
For instance, no matter how we try, the argument we used for countable sets to show that their measure is zero simply cannot work. To remind you, we put smaller and smaller intervals around each point of the countable set. As we made the sets smaller and smaller, we found we could get as small of total length as we wanted.
Now, suppose we try to put smaller and smaller intervals around the points of an uncountable set, like we did with a countable set. Thus, each point has an associated interval length. If we add up all those interval lengths, we are adding up uncountably many numbers that are each greater than zero.
Since we can add up sequences and sometimes get finite numbers, like , this might not seem like a problem. But when you add up uncountably many positive numbers, you always get infinity!
The argument goes like this: split up the (uncountably many) intervals we are using to cover our uncountable set based on their length. Put all the intervals with lengths between, say, 1 and 1/2 into one group, and all those with lengths between 1/2 and 1/4 into another, and all those between 1/4 and 1/8 into another, etc..
We’ve now split our uncountably many intervals into countably many groups. (There are countably many groups since we can count them — here’s the first group, the second group, the third group, etc..) But uncountably infinite is larger than countably infinite, so we can’t fit the intervals into the groups nicely.
At least one of the groups has to have infinitely many intervals!
Each of these groups, though, has a minimum length. Even if the group with infinite intervals had lengths between and , infinitely many of them still add up to infinite length.
So, we need to come up with some other way of showing an uncountable set has zero length.
Fortunately, our good friend Georg Cantor came up with a set that will help us out.
Cantor’s set^{3} is perhaps the simplest example of a fractal, by which we’ll, informally, mean a shape or set that is more or less self-similar as you zoom in closer.
To start, take the interval , all the numbers between 0 and 1.
Then, remove the middle third of the set.
Now, we have two smaller intervals, each with length 1/3. From each of those, remove the middle third again.
Now, we have four intervals. We continue this process infinitely many times. What’s left over is the Cantor set!
If we calculate how much length was left at each step, we started with length 1. After the first step, we had 2 intervals of length 1/3, for a total of . After the second step, we had intervals of length , for a total of . Then , and so on. After steps, the total remaining length would be . As we do this process infinitely many times, this remaining length goes to zero.
Thus, the Cantor set has zero length!
Of course, at first glance, this set doesn’t seem like there’s much there. After all, we removed “everything,” right?
But let’s look a bit closer.
First of all, the endpoints of each interval along our process is in the Cantor set. For instance, the points 0, 1/3, 2/3, and 1 were never removed.
At first glance, these endpoints seem like the only points left. I mean, we keep on cutting out the heart of every remaining interval!
And, if the endpoints are all that’s left, that would only be countably many points, since we can count them. (Just order them up in the same order we cut out the intervals.)
But, if you look even closer, it turns out that the Cantor set does have uncountably many numbers in it. But to figure out why, we’ll need the ternary representation of the numbers.
The ternary expansion is the base 3 version of decimal expansion. For decimals, 0.012 represents 0 tenths, 1 hundredth, and 2 thousandths. In base three, the ternary expansion 0.012 represents 0 thirds, 1 ninth, and 2 twenty-sevenths. In other words, the ternary places are powers of 3 instead of powers of 10. In ternary expansions you only use the digits 0, 1, and 2.
Yeah, definitely weird the first time you see it.
But ternary expansions are perfect for the Cantor set. For instance, the points 0, 1/3, 2/3, and 1, in ternary, are represented by 0, 0.1, 0.2 and 1! In fact, the endpoints of the intervals are all numbers whose ternary representations stop after a finite number of digits.
So, what about those intervals we removed?
The first one was all the numbers between 1/3 and 2/3, or, in ternary, between 0.1 and 0.2. In other words, except for 0.1, we have removed all the numbers whose ternary expansions have the digit 1 in the first slot, the “thirds.”
The second set of intervals was [1/9, 2/9] and [7/9, 8/9]. In ternary, those are [0.01, 0.02] and [0.21, 0.22]. Thus, except for the endpoints 0.01 and 0.21, we have removed all the numbers whose ternary expansions have the digit 1 in the second slot, the “ninths.”
We can carefully continue this pattern. At each step, we are removing the numbers whose ternary expansions have ones in them.^{4} And these are the only numbers which we’re removing.
Thus, the Cantor set is all numbers whose ternary expansions have only the digits 0 and 2, except, perhaps, a terminal 1.
So what did we gain from all this complication?
In addition to the endpoints, which are all decimals that end, we also have a whole bunch of numbers that don’t end, like . We’ve found the extra points!
These numbers look a lot like normal decimals, just a bit restricted. In fact, they’re so similar, that, like normal decimals, there are uncountably many of them. You can even use the same argument we used in the post A bigger infinity for the normal decimals to show that the Cantor set has uncountably many numbers in it.
So, there we have it. Every countable set has length, or measure, zero, but uncountable sets can have length zero as well.^{5}
Of course, any uncountable set is still much bigger than any countable set like the counting numbers , so it seems unfair to lump them together. So, one of these weeks, we’ll show that the Cantor set is different than a countable set. They may both have length zero, but it turns out that the Cantor set is not a zero dimensional set!
Figuring out what that even means will be half the fun!
But, before that, I want to talk about sets so weird that you can’t even measure them. And I don’t mean that their length is zero — That would still be measuring that they were length zero.
I mean sets so weird that you can’t make sense of what it means to measure them. They simply break our method of measuring sets.
The standard example for countable infinity is the counting numbers (1, 2, 3, etc.). Clearly infinite, but also you can clearly count them. For uncountable infinities, the standard example is all the numbers, say between 0 and 1. This is also clearly infinite, but no matter how you arrange it, you can’t count all of those numbers.^{1}
But the size of infinity, which is measured by directly comparing which things are in which sets, is different than the length of infinity.
We want a normal kind of length — The length of all the points between 0 and 1 should be 1. Similarly, a single point should have zero length.
So, somewhere between a single point and all the points between 0 and 1, we go from length zero to length one. Where did it happen? Two points has no more length than one point, and the same goes for 10 billion points. Similarly, if I take all the numbers between 0 and 1, and take away a single point, the length of all those points should still be 1.
The famous non-mathematical version of this is the sorites (so-RITE-eez) paradox. If you have a heap of sand, and take a single grain away, you still have a heap of sand. But if you keep removing one grain at a time, eventually you will only have a single grain remaining, and that’s clearly not a heap. So when did it stop being a heap?
How do we measure the length of infinity?
As commonly occurs in math, the answer is to carefully define what we mean by “length.”
Let’s start with what we can all agree on — the length of an interval.
An interval is all the numbers between two endpoint numbers. For example, is all the numbers between 0 and 1, including 0 and 1, while is all numbers between -17 and 4, not including -17 and 4.^{2}
Whether or not an interval contains its endpoints, we can all agree that the length of that set should be the right endpoint minus the left endpoint. For example, the length of .
Great, now for the complicated part.
We need to use intervals to define the length of any set of points.^{3} What we’ll do is estimate the length of the set using intervals.
For any set, we can cover it using intervals. For instance, if our set was the single point 0, we could cover it with the interval .
If our set was , we could cover it with the intervals , , and .
Since we’re using intervals, the total lengths of these intervals is easy to calculate. In the first case, the length was 1, and in the second case the total length was .
The intervals we choose contain the set we care about, and so the length of the set should be less than the length of the intervals containing it. So, the length of the single point 0 should be less than 1, and the length of should be less than 7/4.
Of course, we could have picked other intervals to cover the sets. The interval or would also contain the point 0, and so the length of zero should also be less than 1/2 and less than 1/4.
The length, or measure, of a set is defined to be the smallest interval length (or sum of lengths, if we use more than one interval) that contains the set we care about.^{4}
Going back to the single point example, the point 0 is contained in , but also in or . In other words, we can cover it with intervals of smaller and smaller lengths, heading towards zero. Thus, by our definition, the measure of the set , the single point 0, is zero. In other words, a single point has no length.
The same kind of argument works for any finite set of points. You take smaller and smaller intervals around each of the points, and so the total length of the intervals go to zero.
This shows that the measure of any finite set is zero.
What happens if we move to infinite set?
The simplest infinite set is the counting numbers, 1, 2, 3, etc.. If we try the exact same thing we did with a finite number of points and cover each point with an interval of the same length, no matter how small each individual interval is, the total length of intervals would be infinite.^{5}
This is not wrong, per se, but remember that our sum of interval lengths is supposed to be an estimate of the length of our set. It’s possible the counting numbers should have infinite length, but let’s see if we can do better than that.
To try to improve our estimate, we’ll put smaller and smaller intervals around each subsequent number. So, we cover 1 with (length 1), cover 2 with (length 1/2), cover 3 with (length 1/4), etc.
Thus, we managed to cover all the counting numbers with a collection of intervals of total length .^{6} That means the measure (i.e., length) of the counting numbers is less than 2.
Of course, we didn’t have to start with an interval of width 1. We could have covered 1 with , 2 with , etc. In this case, we’d have a total length of . So the measure of the counting numbers is less than 1.
Continuing this idea, we could start with smaller and smaller intervals, and end up with a total length of 1/2 or 1/4 or 1/8 and so on. By our definition, the length of the counting numbers has to be zero!
In fact, this same idea works for any countable set, i.e., any set you can count. You cover the first point with a small interval, the second point with a smaller interval, etc. until you’ve covered them all. Then, you try again with even smaller intervals. Thus, the measure of any countable set is zero!
Now, this doesn’t seem very interesting when worded like this, but let me give you a slightly more amazing example.
A rational number is a number that can be written as the fraction of two counting numbers, maybe with a minus sign. So, 3/4 and -12374/421 are rational numbers, but is not. Since any number can be approximated by a rational number (e.g. ), the rational numbers are everywhere.
There are so many rational numbers that, no matter which two numbers you pick, there are infinitely many rational numbers between those two numbers.
It’s surprising, then, that there are only as many rational numbers as there are counting numbers! More details are available in The size of infinity, but the basic idea is that you can line up the rational numbers so that you can count them. In other words, you can list them in a definite order — a first rational number, a second, a third, etc.
But since we can order them in this way, we can put an interval of length 1 on the first rational number, an interval of length 1/2 on the second, an interval of length 1/4 on the third, etc. Thus, the length of all the rational numbers is no more than 2.
And, of course, we can use smaller and smaller intervals, and thus show that the rational numbers have no length at all!
Bizarre, right? Numbers can be dense (which is the technical way to say they’re everywhere, no matter how far you zoom in), but still be so close to nothing that they have no length at all!
The length of (countable) infinity is always zero!
What about larger infinities? We’ll talk about that more next time.
<– Previous Post: How Gödel Proved Math’s Inherent Limitations
The first post on the different sizes of infinities: Infinity plus one
–> Next Post: The Cantor Set
In the last post, we discussed the theorems themselves, and their consequences. In short, they show the inherent limitations of mathematics.
The first theorem relates two concepts: consistency and provability. A mathematical system (a set of assumptions which are called axioms) is consistent if there aren’t any contradictions. In other words, you can’t prove a statement both true and false.
Inside of any logical system, there are many statements, i.e., things you can say. I could say something like “All prime numbers are smaller than a billion.” It’s a false statement, but I can say it.
But just because I can say a statement doesn’t mean that I can prove it true or false. Most of the time, the statement is just very difficult to prove, and so you don’t know how to do it. But it’s also possible to have statements which are impossible to prove either true or false. We’ll call these kind of statements unprovable. Any logical system (set of axioms) with unprovable statements is called incomplete.
Gödel’s first incompleteness theorem says that if you have a consistent mathematical system (i.e., a set of axioms with no contradictions) in which you can do a certain amount of arithmetic, then there are statements in that system which are unprovable using only that system’s axioms.^{1}
In other words, math is incomplete. It is impossible to prove everything.
The most basic idea of the proof of the first incompleteness theorem is to think about the statement, “This statement is unprovable.”
If you could prove this statement true, it is by definition provable. But the statement itself says that it is unprovable, and so, since it is true, the statement is also unprovable! But it can’t be both provable and unprovable. Thus the statement must be only unprovable.
While this is the basic idea we’ll employ, the problem is that there isn’t an obvious formal way to say “This statement is unprovable” inside of math. What do you mean by provable? What does “this statement” refer to? Using which axioms?
Gödel’s proof has to make all of that perfectly precise.
The first step is to show that any precise mathematical statement can be transformed into a number, and vice versa.
This step is clever, but not particularly complicated. At some point, you’ve probably come across a code where each letter is exchanged for a number. If we do a to 1, b to 2, etc., for instance, the word “math” would be “13-1-20-8.” Computers use a similar scheme to store text as 1’s and 0’s.
The number Gödel assigns to a precise mathematical statement uses a similar encoding. There are several ways to do this, but I’ll mention a way similar to how Gödel originally did it.
First, associate each mathematical symbol (in your particular mathematical system) with a unique counting number.^{2} For instance, maybe “0” is saved as 1, while “=” is saved as 2 and “+” is saved as 3.
An mathematical statement is just a list of these symbols. Equivalently, the statement is a list of the numbers we used to encode the individual symbols. For instance, is equivalent to .
To encode the statement as a single number, we set the Gödel number equal to the the product of the first few primes, raised to the powers in the corresponding position in the list. Thus, the Gödel number of is .
For a statement like “”, we’ll use the notation to refer to the Gödel number of that statement. Thus, .
As you can imagine, Gödel numbers can get very large, very quickly, for even moderately long statements. But size is not an issue — we don’t need to write down those numbers, just know they exist.
The key issue is that we can take a number and go backwards to get a mathematical statement.
Every number can be broken up into primes in a unique way. So, , and so the number 145530 represents the statement .
Any precise mathematical statement can be translated into a number this way. Even a proof is just a bunch of statements strung together. (“A” implies “B,” and “B” implies “C,” so “A” implies “C”.) That means we’ve shown that all of math^{3} can be written in terms of just numbers.
Similarly, there is a arithmetical way of checking whether a string of statements (as represented by a Gödel number) is a proof of another statement (as represented by another Gödel number.)^{4}
While translating any mathematical statement into a number seems like an interesting trick, it turns out to be the key to the proof.
The reason it is so important is that it lets us turn any questions about proofs and provability into an arithmetic question about numbers. Thus, we can use only numbers and their properties in order to prove any (provable) statement.
For instance, consider the statement, which I’ll call . The statement is “ is the Gödel number of a statement, and there does not exist a number which is the Gödel number of a proof of that statement.”
Thus, essentially says “The statement represented by ” is unprovable. But, instead of a question about proofs and statements, it is a statement entirely about numbers, and some arithmetical relationship between them!
The exact arithmetical relationship is very, very complicated, but it can be precisely defined. An analogous, but much simpler statement to could be which we’ll call the statement “ is a prime number.” Thus makes a claim about a number, but that claim can be entirely decided just by some (relatively) simple arithmetic.
We’re coming into the homestretch now.
The original idea for the proof was the statement “This statement is unprovable.” With the precise mathematical statement $\latex Unprovable(y)$, we can make that imprecise statement perfectly precise.^{5}
To come up with a precise version of “This statement is unprovable,” we’ll use the “diagonal lemma.” (A lemma is just a theorem you use to prove another theorem.^{6}) The diagonal lemma shows that, in the kind of mathematical system we’re using for this proof, there is some statement which is true if and only if is true.^{7} (Remember, the input to is a number representing the Gödel number of a statement. In this case, that statement is .)
To be clear, the lemma doesn’t prove that either or is true, only that they are either both true or both false. But what does this mean?
Again, the diagonal lemma shows that (some unknown mathematical statement, probably quite long) is true if and only if is true. But being true means that is unprovable. (That was the definition of .)
So, if we were able to prove the statement true, then the diagonal lemma shows that we can prove true. But says that is unprovable! Thus is both provable and unprovable, a contradiction.
Thus must actually be unprovable.
The statement is the precise version of the statement “This statement is unprovable.” that we were looking for. Thus, not every statement can be proved.
Poor, broken math…
<– Previous Post: Gödel’s Incompleteness Theorems
First post in this series: What is math?
–> Next Post: How Long is Infinity?
Mathematics tries to prove that statements are true or false based on these axioms and definitions, but sometimes the axioms prove insufficient. Sometimes the axioms lead to paradoxes, like Russell’s paradox, and so a new set of axioms are needed. Sometimes the axioms simply aren’t enough, and so a new axiom might be needed to prove a desired result.
But in both cases, the paradoxes and inability to prove a result are the result of picking the wrong axioms.
Right?
Unfortunately, it’s not true!
Gödel’s incompleteness theorems show that pretty much any logical system either has contradictions, or statements that cannot be proven!
The questions Gödel was trying to answer were, “Can I prove that math is consistent?” and, “If I have a true statement, can I prove that it’s true?”^{1}
Gödel’s first step in this project was in his PhD thesis. That result seems to imply that you can prove any true statement. This is called Gödel’s completeness theorem.
For a particular set of axioms, there are different “models” implementing those axioms. A model is an example of something that satisfies those axioms.
As a non-mathematical example of a model, let’s say the axioms that define being a “car” are that you have at least 3 wheels, along with at least one engine that rotates at least one of the wheels. A standard car clearly follows those axioms, and is therefore a model for the “car axioms.” A bus would also be a model for the car axioms.
Of course, there are models that are very non-standard…
Mathematical axioms work the same way. There are axioms for the natural numbers, and their addition and multiplication, called “Peano arithmetic” (pay-AH-no). The normal natural numbers, follow these axioms, so are the standard model for them. But there are non-standard models that still follow the Peano arithmetic axioms.
Each model is a bit different. There may be some statements (theorems) that are true in some of the models, but not true in another model.
Even if a statement is true, though, you want to be able to prove it true, using only the axioms that your model satisfies.^{2}
Gödel’s completeness theorem answers the question, “Using the axioms, is it always possible to prove true statements are true?”
His completeness theorem says you can prove a statement is true using your chosen axioms if and only if that statement is true in all possible models of those axioms.^{3}
This result seems very promising for mathematics.
Unfortunately, math is not that simple.
Two years after Gödel published his completeness theorem, he published his incompleteness theorems.
These theorems relate two concepts: consistency and provability. A logical system (a set of axioms) is consistent if there aren’t any contradictions. In other words, you can’t prove a statement both true and false.
Inside of any logical system, there are lots of statements, i.e., things you can say. I could say something like “All prime numbers are smaller than a billion.” It’s a false statement, but I can say it.
But just because I can say a statement doesn’t mean that I can prove it true or false. Most of the time, the statement is just very difficult to prove, and so you don’t know how to do it. But it’s also possible to have statements which are impossible to prove either true or false. We’ll call these kind of statements unprovable. Any logical system (set of axioms) with unprovable statements is called incomplete.
Gödel’s first incompleteness theorem says that if you have a consistent logical system (i.e., a set of axioms with no contradictions) in which you can do a certain amount of arithmetic^{4}, then there are statements in that system which are unprovable using just that system’s axioms.
In other words, as long as your logical system is complicated enough to include addition and multiplication, then your logical system is incomplete. There are things you can’t prove true or false!
Gödel’s second incompleteness theorem gives a specific example of such an unprovable statement. And the example is quite a doozy.
The theorem says that inside of a similar consistent logical system (one without contradictions), the consistency of the system itself is unprovable!^{5}
You can’t prove that math does not have contradictions!
In the next post, I plan on giving an outline of the proof of these theorems, but for this post, let’s talk about their fascinating consequences.
Remember what I said I thought math is?
Math is the quest to decide what must be. But Gödel’s incompleteness theorems put fundamental limits on that quest!
David Hilbert, among others, felt that any true statement should be provable, and that math should be provably consistent.
In 1900, he gave a famous list of open problems in mathematics, the most important ones for the next century. His second problem was, “Prove the axioms of arithmetic are consistent.”^{6}
Gödel’s theorems show that Hilbert’s hope was exactly… wrong.^{7}
As long as your mathematics is complicated enough to include the natural numbers (which, I think we can agree, is not a particularly high bar), then it must have statements which cannot be proven true or false. They are unprovable.
Of course, to “fix” this you could try to add that statement as an axiom.^{8}
Then, since the statement is an axiom, it is trivially provable. (The proof is: “This statement is an axiom. Thus it is true.”)
Yeah, it’s kind of cheating. But the problem is, you can’t even cheat enough to win!
See, your new mathematical system, with your shiny new axiom, is still a mathematical system complicated enough to include the natural numbers. Gödel’s theorem still applies, and shows that there is some new statement that is unprovable! It’s worse than trying to kill a hydra.
Looking at the proof, the vaguest idea of an unprovable statment is “This statement is unprovable,” which seems… silly, and not worth your time. You might hope that all unprovable statements are like that: unprovable, but totally uninteresting.
Unfortunately, mathematicians have found statements of the kind that you might hope to prove, that are unprovable in standard mathematical systems. For instance, it is impossible to prove or disprove that the real numbers have the smallest uncountable cardinality^{9} inside of standard set theory. There are lists of other such important, but unprovable, statements.
So, in pretty much any mathematical system, there are things you’ll want to prove, but can’t.
And that’s just the first incompleteness theorem.
The second incompleteness theorem says that, within your mathematical system, you cannot prove that you can’t have contradictions.
If you’ve proven the statement “there are no contradictions in the system”, your system cannot be consistent because the second incompleteness theorem proved that since your system is complicated enough to include arithmetic, there must be contradictions in the system. Which means–since you’ve proven there are and are not contradictions in the system, your system is inconsistent.
Thus, if you can prove there are no contradictions, then the second theorem says that your system does have contradictions!
Now, using a more powerful system (one with more axioms), you can often prove the consistency (lack of contradictions) of a less powerful system (one with fewer axioms). For instance, Peano arithmetic, which covers essentially the natural numbers and addition and multiplication, can be proven to be consistent with standard (ZFC) set theory, a more powerful system. But Peano arithmetic can’t prove itself consistent.
This leads to a philosophical problem: How do we know standard set theory is consistent? Sure, there are even stronger systems that can prove it’s consistent, but then we have to ask about their consistency. If we keep on adding axioms to prove consistency, have we really proven consistency? We may have inadvertently added contradictions as well!
One last weird thing.
Gödel’s completeness theorem implies that a statement is provable using a set of axioms if and only if that statement is true, for every model of the set of axioms. That means that for any unprovable statement, there has to be a model of those axioms for which the statement is false.
But, if the consistency of the set of axioms is unprovable, that means there has to be a model of your axioms where the consistency statement is false.
Which hurts my head to think about.
Anyway, next time I’ll explain the basic idea of the proofs of these results. It should be fun!
<– Previous Post: Kurt Gödel’s Story
First post in this series: What is math?
–> Next Post: How Gödel Proved Math’s Inherent Limitations
Gödel is famous for proving foundational questions about mathematics. He asked questions like, “Can I prove that math is consistent?” and, “If I have a true statement, can I prove that it’s true?” and, “Can I prove that it’s impossible to prove the statement ‘This statement is unprovable’ is provable?”
Yeah, not exactly the most obvious questions to ask, but important ones, I promise.
Gödel was born in 1906 in what is now Brno, Czech Republic, but was then in Austria-Hungary. His family called him Herr Warum (“Mr. Why”), which is impressive given how fond children everywhere are of that question.
By the time he went to the University of Vienna at 18, he had already mastered university-level math. During this time, he came across Russell’s work on the foundations of mathematics, and met Hilbert, who, around that time, was thinking deeply about axioms and logical systems, and whether it could be shown they had no contradictions, and whether all true statements could be proven.
By 23, Gödel finished his PhD in mathematical logic. Two years later, he published his seminal work on his incompleteness theorems. These papers have the answers to the questions I introduced, but I want to finish talking about Gödel. We’ll discuss the details next time.
Two years after that, in 1933, Gödel became a lecturer at the University of Vienna. He also traveled to the US, where he met Einstein, who became his good friend.
During this time, Hitler came to power in Germany. A few years later, the professor who had originally interested Gödel in logic was assassinated by one of his former students, essentially because he was friends with Jews.^{1} This caused a “nervous crisis” in Gödel. He became paranoid, fearing that he would be poisoned. These symptoms continued later in his life.
In 1938, Nazi Germany annexed Austria. Gödel’s job title was eliminated, so he had to apply to a new job. However, since he had been friends with Jews, they turned him down.
Things got worse the next year. Germany found him fit for conscription, and World War II began. Within the year, Gödel left for Princeton, at the Institute of Advanced Study, where Einstein was.
And, being Gödel, he decided that an Atlantic crossing was too much. So he took the obviously less strenuous route of a train ride across Russia to Japan, a boat ride across the Pacific, then another train ride to Princeton, New Jersey.^{2}
He was very productive during his time in Princeton, proving some other results about the foundations of mathematics.
In 1947, Einstein took Gödel to his US citizenship exam. Gödel, being a constant logician, told Einstein he had discovered an inconsistency in the US constitution that could allow the US to become a dictatorship. Einstein was concerned… not about the possibility of a dictatorship, but that Gödel’s eccentric behavior might endanger his citizenship application.
Einstein was right to fear.
During Gödel’s hearing, the judge asked what kind of government they had in Austria. Gödel replied that it was a republic, but that the constitution was such that it was changed into a dictatorship. The judge expressed his regret, then said that this could not happen in this country.
Gödel replied, “Oh, yes, I can prove it.”
Fortunately, the judge was an acquaintance of Einstein’s, and said, “Oh God, let’s not go into this.”^{2}
Anyway, Gödel kept on working. Among other things, for Einstein’s 70th birthday, Gödel created a spacetime which… breaks general relativity. Well, at least, it has all sorts of things go wrong. For instance, there are “closed timelike loops” through every point of spacetime, meaning that anyone and everyone can time travel. He also expanded Leibniz’s “proof” of God’s existence.
Later in his life, his paranoia recurred. He had an overwhelming fear of being poisoned, and would only eat food that his wife prepared for him. When she was hospitalized for 6 months, he refused to eat, eventually starving to death. At the time of his death, he weighed only 30 kilos.
In the next post, we’ll get to talk about Gödel’s completeness and incompleteness theorems, and come face to face with the inherent limitations of mathematics!
(For those of you who enjoyed this, you might also enjoy my articles on Georg Cantor and Karl Schwarzschild!)