Yes, no, maybe so

You look out the window to see a vast blanket of dark clouds.
"It's going to rain soon," your sister says. "We should probably stay indoors today."
Your brother snorts. "It's not going to rain today!"
You turn to your siblings. "Well, those are the two options..."

Later that day, it snowed.

Over lunch

All integers are either even or odd… right? I mean, those are the only two possibilities: if a number isn’t even, then it must be odd!

Similarly, you would expect that all mathematical statements are either true or false. What else could they be? If a statement isn’t true, then it must be false… right? Well, this is certainly the case in classical logic, and this fact is given a name: the law of the excluded middle (LEM).

Classical logic refers to a “rule book” for how to reason in mathematics, and describes what kinds of deductions are allowed. For instance, the following three arguments follow the same logical rule:

If it rains, I will stay indoors.
It is raining.
Therefore, I will stay indoors.
Where there's smoke, there's fire.
There's smoke.
Therefore, there's fire.
If the function is differentiable, then it is continuous.
The function is differentiable.
Therefore, it is continuous.

The underlying rule here is called modus ponens: if we know that some mathematical statement P is true, and P\text{ implies }Q, then we may conclude that the statement Q is true also. The law of the excluded middle is another rule: for any mathematical statement P, this rule asserts that “either P is true, or \text{not }P is true.”

While this “law” seems obvious—in fact, all of the laws of classical logic are “obvious”—there are other logical rule books that do not accept the law of the excluded middle. The most common (but probably not well-known) example is intuitionistic logic. Ironically, it doesn’t seem at all intuitive that the law of the excluded middle might fail: how can neither P nor \text{not }P be true?!

The reason this is somehow possible is because an intuitionistic logician takes “true” to mean “provably true.” In other words, a logical statement isn’t taken to be true unless you can provide a construction or some other direct demonstration that the statement is true (this is why intuitionistic logic is sometimes called “constructive logic”). In a way, this is a more practical form of truth, because it can actually be tested. The reason the law of the excluded middle may fail now is because there is a middle: there are mathematical statements P where neither P nor \text{not }P can be proven to be true.

We can make more sense of intuitionistic reasoning by drawing parallels to the judicial system. In a court, the accused is pronounced innocent until proven guilty. Likewise, a logical statement in an intuitionistic framework is pronounced false until proven true. Classical logic would correspond to whether or not the accused truly is guilty for the crime, which is an unattainable ideal in the real world (since how else, if not with proof, would we know for certain that the accused is guilty of the crime?). In this analogy, it’s easier to believe that the law of the excluded middle can fail: there are many instances of criminals getting away with their crimes due to a lack of evidence.

On the bus

Frankly, I was pretty averse to intuitionistic logic when I was first exposed to it in undergrad: it’s not typically within the bounds of concern for a working mathematician (not even things regarding the validity of the axiom of choice or the like really matter to most mathematicians). To me, the theory of intuitionistic logic was nothing more than the result of taking the theory of classical logic, and removing all theorems that relied on the law of the excluded middle or the axiom of choice. In other words, using intuitionistic logic, from my point of view, was an artificial handicap to doing mathematics.

I vaguely understood “why” the law of the excluded middle was non-constructive. The law of the excluded middle allows you to argue via proof by contradiction: to prove that a result P is true, it is enough to show that \mathrm{not }P leads to some nonsensical contradiction. For example, suppose I wanted to prove the following:

Proposition 1. There exists a real number \alpha such that p(\alpha)\neq0 for any nonzero polynomial p(x) with integer coefficients.

To get a sense what Proposition 1 is saying, here are some examples of values for \alpha that don’t work. First, \alpha can’t be an integer: if \alpha=5, then p(\alpha)=0 where we take the polynomial p(x) = x-5. A bit more generally, \alpha can’t be rational: if \alpha = \frac27, then p(\alpha) = 0 where we take the polynomial p(x) = 7x-2. Therefore, any \alpha that satisfies Proposition 1 has to be irrational. This is not enough, though. For example, if we take the irrational number \alpha=\sqrt2, then p(\alpha)=0 for the polynomial p(x) = x^2-2. So, how do we prove that Proposition 1 is true?

Well, a proof by contradiction would be to suppose Proposition 1 is false. A number \alpha that does not satisfy Proposition 1 is called algebraic (therefore, the above paragraph proves that integers, rational numbers, and numbers like \sqrt2 are all algebraic). So, to prove that non-algebraic numbers exist, begin by supposing that every real number is algebraic. Now, let’s use this assumption to count how many real numbers there are. We have seen in a previous post (on the bus) using Cantor’s diagonal argument that there are too many real numbers than can be written out in an infinite list. On the other hand, the algebraic numbers can actually be listed out. Without going into too much detail, you can do this by:

  • listing out all possible nonzero integer coefficient polynomials
  • for every such polynomial in the list, replace it with the list of its (real) roots. This is always finite list for every polynomial because a polynomial p cannot have more roots (i.e., values \alpha where p(\alpha)=0) than its degree.

Therefore, since we can write out every algebraic number in an infinite list, but we cannot do the same for all real numbers, it is impossible for all real numbers to be algebraic. In conclusion, Proposition 1 is true!

In more detail (if you care). Recall that \aleph_0 denotes countable infinity—the infinity that can be listed out. For any finite number n\geq1, the number of degree-n polynomials with integer coefficients can be determined as follows. Such a polynomial has the form a_nx^n+a_{n-1}x^{n-1}+\dots+a_0, and we have \aleph_0 many choices for every a_i. Therefore, we have \underbrace{\aleph_0\times\dots\times\aleph_0}_{n+1\text{ times}} possible polynomials. By that previous post (on the bus), this is equal to \aleph_0. Now, any such polynomial can only have \leq n real roots, so this means that there are at most \aleph_0\times n = \underbrace{\aleph_0+\dots+\aleph_0}_{n\text{ times}} real roots of polynomials of degree n with integer coefficients. Again, this works out to be equal to \aleph_0.

To count how many algebraic numbers there are in total, we can count the number of real roots of polynomials of degree n with integer coefficients for every n=1,2,3,\dots. This totals to at most \underbrace{\aleph_0+\aleph_0+\dots}_{\aleph_0\text{ times}} = \aleph_0\times\aleph_0 = \aleph_0 algebraic numbers. In other words, there are countably many algebraic numbers, and therefore they can be written out in an infinite list. \blacksquare

With the above (relatively short) proof, you get that there must exist non-algebraic real numbers. In fact, it proves that almost every real number is not algebraic! However, it doesn’t give you an example of a non-algebraic number at all, despite how “common” they are in the real number line. Since the proof doesn’t explain how you may “construct” a non-algebraic number, this line of reasoning is not intuitionistic / constructive.

A more constructive proof would require providing an explicit example of a non-algebraic number (along with a proof that it is non-algebraic). You might be able to guess some examples: \pi is non-algebraic, for instance. However, how would you prove that \pi is non-algebraic? You have to prove that \pi^3-3\pi-21\neq0 (it’s roughly 0.581), and that 6\pi^2-23\pi+13\neq0 (it’s roughly 0.039), etc… For any nonzero integer polynomial p(x), it must be that p(\pi)\neq0. This ends up being significantly harder to prove than the mere existence proof given above![cf] However, it comes with the benefit that we have an actual example of a non-algebraic number, rather than leaving us with nothing but knowledge that these things are somehow in abundance.

Most people probably believe that all mathematical propositions have definitive truth values—i.e., they’re either true or false—and therefore that classical logic is a perfectly reasonable framework to do mathematics. Unfortunately, this belief becomes more questionable the deeper you look. First, as the above example might allude to, there are instances where certain mathematical objects are proven to exist (even in great abundance), but we still lack any explicit examples. One example[cf] is the following: it is known that almost every real number \alpha>1 satisfies that the fractional part (i.e., the decimal part) of the numbers \alpha,\alpha^2,\alpha^3,\alpha^4,\dots are distributed evenly throughout the interval [0,1], and yet we don’t have a single known (proven) example of such a number! To realise how absurd this is, “almost every” in this case implies that if you were to uniformly randomly choose a real number \alpha between 1 and 2, you have a 100% chance of choosing an \alpha satisfying this property… and yet we don’t have a single example of one. How can something really exist in that kind of abundance and still be impossible to find?

It gets even worse if you start thinking about Gödel’s Incompleteness Theorems: any consistent axiomatic foundation that is strong enough to encompass basic arithmetic will have propositions that cannot be proven to be true nor false. This includes the most common axiomatic foundations of mathematics (ZFC set theory). A famous example of such a proposition—as mentioned in my post about infinity—is the Continuum Hypothesis: ZFC is capable of neither proving nor disproving the following claim.

Continuum Hypothesis. If a set S of real numbers cannot be enumerated in an infinite list, then it can be put in 1-to-1 correspondence with all real numbers.

This means that there are models of ZFC (i.e., foundations where the axioms of ZFC are true) where the Continuum Hypothesis is true, and also models of ZFC where the Continuum Hypothesis is false. With this in mind… how can the Continuum Hypothesis have a definitive truth value? Ultimately, the study of mathematics is less about what is true or false, but more about what is provably true or provably false. Therefore, despite how mathematicians will usually function with a classical logical mindset, the Continuum Hypothesis shows that this classical logic must at the very least be taken with a grain of salt.

In the lounge

A place where classical logic may not be applicable (that I’m curious about, but not at all familiar with) is within an elementary topos. A topos is basically a category that carries enough structure that its “internal set theory” (since all categories are basically set theories) is sufficiently similar to usual set theory in mathematics. To help myself understand this a bit better, I will be relying primarily on [Str03, Ch. 13] and [MM94, Ch. VI.5–7].

The relevant logics for the set theories are higher order and typed (they’re also intuitionistic). Being higher order allows us to reason with predicates using other predicates, and being typed allows us to control the domain of discourse when quantifying statements.

First, recall that an elementary topos \mathcal E is a finitely complete, cartesian closed category such that the subobjects functor \mathrm{Sub}:\mathcal E^{\mathrm{op}}\to\mathbf{Set}—where subobjects are taken modulo isomorphism—is representable by some object \Omega appropriately called the subobject classifier. Intuitively, \Omega is the “object of truth values” in \mathcal E, and the natural isomorphism \mathrm{Sub}(A)\cong\mathrm{Hom}(A,\Omega) tells us that we can identify “predicates” on A (i.e., morphisms A\to\Omega) with subobjects of A, where the subobject corresponding to \phi should be thought of as \{a \in A \mid \phi(a)\} \subseteq A. Conversely, the predicate corresponding to a subobject S\subseteq A should be thought of as the “characteristic map” \chi_S:A\to\Omega.

We have a well-defined subobjects functor because \mathcal E is finitely complete, as the functor sends morphisms to their pullback action (and pullbacks preserve monomorphisms). Moreover, the presence of pullbacks in \mathcal E already makes \mathrm{Sub}(A) a meet semilattice (with the order relation being given by subobject inclusion, the top element being A, and the meet operation being given on S,T by S\cap T = S\times_AT). In fact, since limits commute with limits, this means that the subobjects functor factors through the category of meet semilattices, meaning that \Omega is an internal meet semilattice in \mathcal E as well, equipped with a top element \top:*\to\Omega, a conjunction \wedge:\Omega\times\Omega\to\Omega, and a partial order (\vdash)\subseteq\Omega\times\Omega (which is an “external implication” relation). It’s an external relation because the truth value of p\vdash q is given by a truth value of the ambient (external) logic, and is not given by a “truth value” in \Omega. An internal implication would have to be a morphism \Omega\times\Omega\to\Omega.

More generally, the internal hom structure implies that a similar internal meet semilattice structure is given to all power objects \Omega^A since they internalise \mathrm{Sub}(A). In a general power object, denote the external implication by \vdash_A, the top element by \top_A, and the conjunction by \wedge_A. If the context is clear, the subscripts may be dropped.

For example, we can internalise “equality” in a fairly straightforward manner: to define an internal equality relation (=_A):A\times A\to\Omega, just take the morphism corresponding to the diagonal subobject A\subset A\times A. Given global elements x,y:*\to A, the internal predicate x=_Ay is only “true” if it coincides with \top:*\to\Omega, and we can see that this happens if and only if x=y externally (the story is similar for generalised elements, but validity is checked by seeing if it factors through \top; i.e., it coincides with \top_A). However, if x\neq y, the truth value of x=_Ay can be a plethora of things other than merely “false.”

Applying the above internalisation to the subobject classifier, we obtain an internalisation of “equivalence” (\Leftrightarrow) := (=_\Omega) : \Omega\times\Omega\to\Omega. More generally, applying this to a power object \Omega^A, we get an internalisation of “equivalence of predicates” on A given by (\Leftrightarrow_A) := (=_{\Omega^A}) : \Omega^A\times\Omega^A\to\Omega.

It turns out we can internalise implication as well: define the implication relation for predicates on A as the composite

(\Rightarrow_A) : \Omega^A\times \Omega^A\xrightarrow{(\wedge_A, \mathrm{proj}_1)}\Omega^A\times \Omega^A\xrightarrow{(\Leftrightarrow_A)}\Omega

This internalises the statement that p\implies q says the same thing as p\land q \iff p. This is an internalisation of the external implication because for any predicates \phi,\psi:A\to\Omega, we get that \phi\vdash_A\psi (i.e., \{a\in A \mid \phi(a)\}\subseteq\{a\in A\mid \psi(a)\}) if and only if \phi\Rightarrow_A\psi factors through \top. Moreover, this defines a Heyting implication: \phi\wedge\psi\vdash_A\chi if and only if \phi\vdash_A(\psi\Rightarrow\chi).

We can also internalise universal quantification \forall_A:\Omega^A\to\Omega, which is the higher order predicate corresponding to the subobject *\subseteq\Omega^A adjunct to the true predicate *\times A\cong A\xrightarrow{\top_A}\Omega. Given a predicate \phi:A\to\Omega, we get that \phi(u)=\top_U for all generalised elements u:U\to A if and only if \forall_A\phi=\top (the story is the same for generalised internal predicates \phi:I\to\Omega^A, where \forall_A\phi=\top_I if and only if \phi(-,u)=\top_{I\times U} for all generalised elements u:U\to A). This allows us to also write \forall_A\phi as \forall_{x:A}\phi(x) or (\forall x:A)\phi(x).

This gives us a lot of the basic elements of the Mitchell-Bénabou language of \mathcal E. To handle the remaining connectives, we use the following trick: \Omega must have some notion of “false” (otherwise it would fail to satisfy its universal property), so define the negative connectives by scanning \Omega for this truth value and taking the infimum outcome.

Explicitly, define “false” \bot:*\to\Omega to be \bot := \forall_{p:\Omega}p, and then define negation \lnot:\Omega\to\Omega via \lnot\phi := \forall_{p:\Omega}(\phi\Rightarrow p). Similarly, we can define disjunction \vee:\Omega\times\Omega\to\Omega as \phi\vee\psi := \forall_{p:\Omega}((\phi\Rightarrow p)\wedge(\psi\Rightarrow p)\Rightarrow p), and similarly existential quantification \exists_A:\Omega^A\to\Omega as \exists_A\phi := \forall_{p:\Omega}\forall_{x:A}(\phi(x)\Rightarrow p). These are basically given by de Morgan’s Laws.

The idea now is that the objects of \mathcal E are types, and provide context in which predicates (i.e., morphisms into the subobject classifier) can make sense. Given predicates \phi,\psi:\Gamma\to\Omega, we can read \phi\vdash_\Gamma\psi as a sequent in the context \Gamma (which is really just an object of \mathcal E).

The benefits of Mitchell-Bénabou language are that they can really help us think about objects of a topos more naïvely set-theoretically. For example, a morphism f:A\to B is a monomorphism iff \vdash_{(x,y):A\times A}f(x)=_Bf(y)\Rightarrow x=_Ay, and is an epimorphism iff \vdash_{y:B}\exists_{x:A}f(x)=_By. The latter contrasts the usual co-elementary way of generalising surjections. Similarly, we can define a natural numbers object simply as an object \mathbb{N} and morphisms o:*\to\mathbb{N} and s:\mathbb{N}\to\mathbb{N} satisfying the Peano axioms:

  • \vdash_{n:\mathbb{N}}\lnot(s(n)=_{\mathbb{N}}o)
  • \vdash_{(n,m):\mathbb{N}\times\mathbb{N}}s(x)=_{\mathbb{N}}s(y)\Rightarrow x=_{\mathbb{N}}y
  • \vdash_{S:\Omega^{\mathbb{N}}}o\in_{\mathbb{N}}S \wedge \forall_{n:\mathbb{N}}(n\in_{\mathbb{N}}S\Rightarrow s(n)\in_{\mathbb{N}}S)\Rightarrow S=_{\Omega^{\mathbb{N}}}\mathbb{N}

(Note that membership \in_A:A\times\Omega^A\to\Omega is jut given by the evaluation counit of the power object.) This language also allows us to define a real numbers object in a topos that has a natural numbers object! Indeed, you just have to follow the usual sequence of steps: \mathbb{N}\subset\mathbb{Z}\subset\mathbb{Q}, and then take Dedekind cuts. For example, starting with the category of sheaves over a topological space X, the natural numbers (resp. integers, rational numbers) object is given by the constant sheaf on the set of natural numbers (resp. integers, rational numbers), and the real numbers object is given by the sheaf of real-valued continuous functions on X.

Now, the appearance of intuitionistic logic comes when you think about the Law of the Excluded Middle: \forall_{p:\Omega}(p\lor\lnot p). The Law of the Excluded Middle holds if and only if our topos \mathcal E is Boolean, meaning that \mathrm{Sub}(A) is a Boolean algebra for every object A (i.e., every subobject has a well-defined complement). Indeed, these are equivalent to stating that \Omega (and thus any power object) is an internal Boolean algebra, in which case the complement is given by negation. As you might then expect, not all topoi are Boolean.

To see this, note that \Omega \cong \Omega^{\{*\}} internalises the Heyting algebra \mathrm{Sub}(*) of subterminal objects in \mathcal E (put another way, the subterminal objects correspond to truth values of \mathcal E). This has to be a Boolean algebra (i.e., negation has to serve as a complement) if \mathcal E is Boolean. For instance, \Omega = \{\varnothing, \{*\}\} = \{\bot,\top\} in \mathcal E=\mathbf{Set} is a Boolean algebra.

If we take an over topos (\mathcal E\downarrow X) (where the subobject classifier is explicitly given by the projection \Omega\times X/X), the terminal object is the identity X/X, and so subterminal objects are precisely the subobjects of X. For instance, if we take the topos of S-graded sets (\mathbf{Set}\downarrow S) \simeq \mathbf{PSh}(S) (where this equivalence comes from taking fibres, or more abstractly by mumbling about Grothendieck), then we get 2^{|S|}-many truth values. However, this is still a Boolean topos since 2^S is a Boolean algebra, so this shows that Boolean topoi need not have only two truth values.

Taking a general Grothendieck topos \mathbf{Sh}(\mathcal C,J), the subterminal sheaves correspond to subsets P of objects of \mathcal C such that

  • V\in P whenever U\in P and there exists a morphism f:V\to U.
  • conversely, when given a covering sieve S\in J(U) such that V\in P for all V\to U in S, then also U\in P.

In other words, the truth values of \mathbf{Sh}(\mathcal C,J) correspond to “local regions” in \mathcal C, which can be thought of as maximal regions where a proposition can hold over \mathbf{Sh}(\mathcal C,J) locally. For instance, if we take a topos of sheaves over a topological space X, then this reduces to the lattice \mathrm{Op}(X) of open subsets of X. Here, we can see that Grothendieck topoi are generally not Boolean: negation is given by taking the interior of the set-theoretic complement of an open subset (and disjunction is given by union), from which we can see that \mathbf{Sh}(X) fails to be Boolean if X has open subsets that are not closed. The fact that the internal logic may not be Boolean (and therefore may not satisfy an appropriate internal axiom of choice) leads to why we use the more choice-free construction of real numbers (as opposed to the Cauchy completion, as its elements are equivalence classes of sequences and would require choice to be recovered from the Dedekind cuts, whereas the converse direction is choice-free). Indeed, the Cauchy real numbers object in a category of sheaves over a topological space X is (I think[cf]) given by the sheaf of locally constant real-valued continuous functions on X (and is thus strictly smaller).

Lastly, I want to say a bit about Kripke-Joyal semantics, which are related to forcing (tying things back to provability theory). The forcing relation, denoted U\Vdash\phi(\alpha) where \phi:A\to\Omega is a predicate and \alpha:U\to A is a generalised element of A, holds precisely if \alpha factors through \{a\in A \mid \phi(a)\}\subseteq A (equivalently, if \phi\circ\alpha = \top_U). We then immediately have two basic properties:

  • monotonicity: if U\Vdash \phi(\alpha), then V\Vdash \phi(\alpha\circ f) for any f:V\to U
  • local character: conversely, if p:V\twoheadrightarrow U and V\Vdash \phi(\alpha\circ p), then U\Vdash \phi(\alpha).

Intuitively, these make the forcing relation appear to assess truth as a local property: the first suggests that truth continues to hold when taking a subset, and the second suggests that truth is preserved by coverings. Moreover, we get the following.

Theorem 2. Let \phi,\psi:A\to\Omega be predicates on A, and \alpha:U\to A a generalised element of A. Also, let \Phi:A\times B\to\Omega be a bivariate predicate. Then,

  1. U\Vdash \phi(\alpha)\wedge\psi(\alpha) iff U\Vdash \phi(\alpha) and U\Vdash\psi(\alpha)
  2. U\Vdash \phi(\alpha)\vee\psi(\alpha) iff there exist jointly epic p:V\to U and q:W\to U (i.e., [p,q]:V\sqcup W\twoheadrightarrow U) where V\Vdash \phi(\alpha\circ p) and W\Vdash\phi(\alpha\circ q)
  3. U\Vdash \phi(\alpha)\Rightarrow\psi(\alpha) iff V\Vdash\phi(\alpha\circ p) implies V\Vdash\psi(\alpha\circ p) for any p:V\to U
  4. U\Vdash\lnot\phi(\alpha) iff the only time V\Vdash\phi(\alpha\circ p) for some p:V\to U is when V=\emptyset.
  5. U\Vdash\exists_{b:B}\Phi(\alpha,b) iff V\Vdash\phi(\alpha\circ p, \beta) for some p:V\twoheadrightarrow U and some generalised element \beta:V\to B
  6. U\Vdash\forall_{b:B}\Phi(\alpha,b) iff V\Vdash\phi(\alpha\circ p,\beta) for every p:V\to U and every generalised element \beta:V\to B. This is moreover iff U\times B\Vdash\phi(\alpha\circ\pi_U,\pi_B).

Notice that disjunction (and existence) highlights the local / intuitionistic nature of the internal logic. In fact, if we restrict our attention to a Grothendieck topos \mathcal E = \mathbf{Sh}(\mathcal C,J), this “locality” intuition becomes more explicit:

Theorem 3. Suppose A,B:\mathcal C^{\mathrm{op}}\to\mathbf{Set} are sheaves and \alpha\in A(U) for some U\in\mathcal C. Then,

  1. U\Vdash \phi(\alpha\wedge\psi(\alpha) iff U\Vdash\phi(\alpha) and U\Vdash\psi(\alpha), as before
  2. U\Vdash \phi(\alpha)\vee\psi(\alpha) iff we can find a covering \{p_i:U_i\to U\}_{i\in I} such that U_i\Vdash \phi(\alpha\circ p_i) or U_i\Vdash\psi(\alpha\circ p_i) for each i\in I
  3. U\Vdash\phi(\alpha)\Rightarrow\psi(\alpha) iff V\Vdash\phi(\alpha\circ p) implies V\Vdash\psi(\alpha\circ p) for every p:V\to U (in \mathcal C!)
  4. U\Vdash\lnot\phi(\alpha) iff the only time V\Vdash\phi(\alpha\circ p) for some p:V\to U (in \mathcal C) is when the empty family covers V
  5. U\Vdash\exists_{b:B}\Phi(\alpha,b) iff there exist a covering \{p_i:U_i\to U\}_{i\in I} and \beta_i\in B(U_i) such that U_i\Vdash\phi(\alpha\circ p_i,\beta_i) for every i\in I
  6. U\Vdash\forall_{b:B}\Phi(\alpha,b) iff V\Vdash\phi(\alpha\circ p,\beta) for all p:V\to U (in \mathcal C) and all \beta\in B(V)

One thought on “Yes, no, maybe so

Leave a comment