(Co-)Elementary, my dear Watson

Some days ago, I made this bold claim that categories really just model theories of sets (with additional structure). You can read the detailed spiel here, but I’ll summarise the main points below:

  • Given an object X, we have a set h_X(S) of “S-shaped” elements of X, for any object S. Therefore, X is really a collection of “generalised elements”.
  • By thinking of X in terms of its generalised elements, we can do set theory in any category! First example: a morphism f:X\to Y is monic if it is injective on generalised elements—hence, we have an analogue of injectivity of functions!
  • Second example: the categorical product X\times Y is just the collection of pairs (x,y) where x is a generalised element of X, and y is a generalised element of Y—hence, we have an analogue of the cartesian product of sets!

I justified all this by appealing to the Yoneda lemma. However, I have only told half of the story. What about the many other set-theoretic constructions? Sometimes, generalised elements just don’t cut it, and for good reason.

Let’s look first at trying to translate surjectivity of functions to category theory. A function f:X\to Y of sets is surjective if we can find for any element y\in Y some element x\in X such that f(x)=y. So, why not define a morphism of a general category to be “surjective” if it is surjective on generalised elements? This turns out to be no issue in \mathbf{Set}, but this is for special reasons (as we will see in a bit). Let’s see some of several places where morphisms that should be surjective fail to be by this definition:

  • In \mathbf{Grp}: consider the canonical group homomorphism \mathbb{Z}\to\mathbb{Z}/2\mathbb{Z} from the integers to the group of integers modulo two, which sends an integer to its remainder modulo two. This is obviously surjective on genuine elements (and it’s a projection onto a quotient!), but it actually fails to be surjective on all generalised elements. For one example, we have two generalised elements of shape \mathbb{Z}/4\mathbb{Z} in \mathbb{Z}/2\mathbb{Z}—one is trivial, and the other sends an integer modulo four to its remainder modulo two—but we only have one generalised element of the same shape in \mathbb{Z} (the trivial one)..
  • In \mathbf{CRing}: you can use the same example, viewing the above groups as rings.
  • In \mathbf{Top}: take any set X with at least two elements, then consider the identity function \mathrm{id} : X\to \overline X, where we view X as having the discrete topology, and \overline X is the same set, but with the codiscrete topology. This is even a bijective continuous function, but it’s still not surjective on generalised elements! (In fact, it’s only surjective on generalised elements of a discrete shape.) Consider the unit interval [0,1]: the [0,1]-shaped points of a space are the paths in the space. However, the only paths on the discrete space X are constant, whereas all maps [0, 1]\to X are paths on \overline X.

Before we talk about what the correct notion of surjectivity should be in a general category, what’s actually going here? Well, the problem is that the axiom of choice fails for all of these categories:

Proposition 1: In any category, a morphism f:X\to Y is surjective on generalised elements if and only if it has a section (a right inverse); that is, there exists a morphism s:Y\to X such that f\circ s = \mathrm{id}_Y. Such a morphism is called a split epimorphism.

Proof. It’s clear that if f has a section s, then it is surjective on generalised elements: for any generalised element y of Y, just take the generalised element s\circ y of X, then f\circ(s\circ y) = y. In the other direction: if f:X\to Y is surjective on generalised elements, then there must be a generalised element s of X that corresponds to the element \mathrm{id}_Y of Y—this is precisely a section of f. \blacksquare

(This also exposes that for all of my examples, I picked needlessly complicated generalised elements as counterexamples to this strong notion of surjectivity.) In the category \mathbf{Set}, this means that a function is surjective if and only if it is surjective on generalised elements. This is because the axiom of choice is the statement that every surjective function of sets admits a section.

Here’s another example where generalised elements fail to be helpful: the disjoint union of sets X, Y is defined to be the set X\sqcup Y that consists of a distinct copy of every element of X and a distinct copy of each element of Y. In a general category, then, why not define the “disjoint union” of objects X, Y to be—when it exists—the object whose S-shaped elements are those of X along with those of Y? In other words, a map S\to X\sqcup Y corresponds either to a unique map S\to X or a unique map S\to Y. The issue here is even more severe—this doesn’t even work in \mathbf{Set}: this says that a function S\to X\sqcup Y has to either map entirely into X or otherwise entirely into Y, but we can easily construct functions S\to X\sqcup Y that map partially into X and partially into Y (for example, just take the identity map X\sqcup Y\to X\sqcup Y).

So what do we do? Well, a very loose hint is that surjectivity and disjoint unions seem somehow associated with injectivity and cartesian products, respectively. You might even be so bold as to say they feel “dual” to each other. I wish I could remember if this “duality” felt like something that should be true to me when I was a naïve little first year who just learned these things (yeah, I didn’t learn formal maths until I entered university). Regardless, there is some evidence that could loosely suggest the existence of some elusive duality in each case:

  • Given a function f:X\to Y, injectivity only concerns itself with the domain X, whereas surjectivity is about the codomain Y. There’s also plenty of dual-looking facts about injectivity and surjectivity. For instance, a function with a left inverse is always injective, while a function with a right inverse is always surjective. Also, given functions f:X\to Y and g:Y\to Z, we have that injectivity of g\circ f implies injectivity of f, whereas surjectivity of g\circ f implies surjectivity of g.
  • The “duality” between the cartesian product and disjoint union might seem like more of a stretch than the above duality (unless, of course, you already know where to look). Here’s a convoluted way to do it. It doesn’t take many mental leaps to agree that the cartesian product is to multiplication of numbers as the disjoint union is to addition—I mean, if the sets are finite, then |X\times Y| = |X|\times|Y| and |X\sqcup Y| = |X|+|Y|. This might be enough if you already think addition and multiplication are “dual” to each other in some way. Otherwise, restrict multiplication and addition to \{0, 1\} and identify 0 with “false” and 1 with “true” (this might be natural if you’re a computer scientist), then you’ll find that multiplication reduces to conjunction (“and”), and addition reduces to disjunction. Now we’re in business: these really are dual to each other, and not just in name. By De Morgan’s Laws, the negation of conjunction is disjunction, and the negation of disjunction is conjunction!

The above discussions should at least convince you that maybe—just maybe—there’s some sort of underlying duality at work here. To see what this duality really is, here’s another argument for the disjoint union being dual to the cartesian product (and this time, it’s a lot less “forced”). Think about how you would define a function S\to X\times Y, and compare it with how you would define a function X\sqcup Y\to T. For the cartesian product, it’s enough to define functions f:S\to X and g:S\to Y, and then the function S\to X\times Y would just send s\mapsto (f(s),g(s)). For the disjoint union, it’s similarly enough to define functions p:X\to T and q:Y\to T, as the corresponding function X\sqcup Y\to T would be given by sending x\in X to p(x) and y\in Y to q(y).

In fact, this is the duality! The disjoint union is just like the categorical product, except the morphisms are all pointing backwards. In fact, the same can be said about surjections (when compared to injections): a function f:X\to Y is surjective if and only if for every pair of functions p,q:Y\to T such that p\circ f=q\circ f, it must have already been the case that p=q. Indeed, the reason for this is because f being surjective allows us to “see all of Y” through f, and so we would have been able to catch any y\in Y for which p(y)\neq q(y). What’s great is that we don’t need to work that hard to make dual definitions (so for any definition in category theory, you get one free!) thanks to opposite categories:

Definition 2: Given a category \mathcal{C}, its opposite category is the category \mathcal{C}^{\mathrm{op}} whose objects are the same as those of \mathcal{C}, and whose morphisms f^{\mathrm{op}}:Y\to X are precisely the morphisms f:X\to Y in \mathcal{C}. In particular, this means that composition is defined by f^{\mathrm{op}}\circ g^{\mathrm{op}} := (g\circ f)^{\mathrm{op}}, and the identity morphisms are just \mathrm{id}_X^{\mathrm{op}} = \mathrm{id}_X.

With access to opposite categories, we can define the appropriate analogues of surjections and disjoint unions in arbitrary categories. For the analogue of surjections: a morphism f:X\to Y is called an epimorphism if f^{\mathrm{op}}:Y\to X is a monomorphism in the opposite category. For the analogue of disjoint unions: the coproduct X\amalg Y is just the categorical product in the opposite category.

  • The proof that surjective functions in \mathbf{Set} are epimorphisms translates easily to many categories of “set-like” objects (e.g., \mathbf{Top}, \mathbf{Grp}, \mathbf{CRing}, \mathbf{Vect}_{\mathbb{R}}) to show that all surjective functions are epimorphisms.
  • If you think of a (partially) ordered set P as a category, then the coproduct of elements p,q\in P—if it exists—is their join (or supremum) p\lor q = \sup\{p,q\}. In particular, let P = \{0<1\} (where we think of 0 as false and 1 as true), then the categorical product in P is conjunction, and the categorical coproduct in P is disjunction!
  • The coproduct of topological spaces is just given by their disjoint union.
  • The coproduct of groups is given by the free product G_1\amalg G_2 = G_1*G_2. The relation to the disjoint union of sets is as follows. We can present any group G as a set S of generators subject to a set R of relations—written as G = \langle S \mid R\rangle. Then, the free product of two groups G_1=\langle S_1\mid R_1\rangle and G_2 = \langle S_2\mid R_2\rangle is just the group G_1*G_2 := \langle S_1\sqcup S_2\mid R_1\sqcup R_2\rangle.
  • The coproduct of commutative rings is given by their tensor product (as \mathbb{Z}-algebras) R_1\amalg R_2 = R_1\otimes_{\mathbb{Z}}R_2.
  • The coproduct of vector spaces is given by the direct sum U\amalg V = U\oplus V. This is because all vector spaces are free, so by picking a basis B_U\subset U and B_V\subset V, we just see that \mathop{\mathrm{Span}}_{\mathbb{R}}(B_U) \oplus \mathop{\mathrm{Span}}_{\mathbb{R}}(B_V) \cong \mathop{\mathrm{Span}}_{\mathbb{R}}(B_U\sqcup B_V). You might notice that the product of vector spaces is also given by the direct sum. This phenomenon actually happens in any category where morphisms can be “added” together (but note that this only works for finite (co)products—in the infinite case, we find that the product becomes strictly bigger!)

We can also define a “generalised coelement” of an object X to be a generalised element in the opposite category, and therefore S-shaped coelements of X are just morphisms X\to S. We then can make the definitions of epimorphisms and coproducts (and more!) just as precise as their duals:

Notion Definition Dual Definition Dual Notion
monomorphism injective on generalised elements injective on generalised coelements epimorphism
split epimorphism surjective on generalised elements surjective on generalised coelements split monomorphism
product X\times Y objects whose S-shaped elements are pairs (x,y) with x:S\to X and y:S\to Y objects whose S-shaped coelements are pairs (x,y) with x:X\to S and y:Y\to S coproduct X\amalg Y

Moreover, you can make things precise just as in my post about elements and characterise objects completely (up to isomorphism) by their coelements, yielding the notion of (corepresentable) copresheaves so that the above definition of a coproduct X\amalg Y makes sense as the corepresenting object for the copresheaf h^X\times h^Y. The Yoneda Lemma applies here as well to tell us that the elements (not coelements!) of A(S) for any copresheaf A correspond precisely to the h^S-shaped coelements of A itself so that this all makes sense.

Side note: The above statement is not the co-Yoneda Lemma! The co-Yoneda Lemma tells us that any presheaf A can be computed as the coend A \cong \int^{S:\mathcal C}h_S\times A(S)—in particular, this tells us that every presheaf can be computed as a colimit of representable presheaves. The reason that this statement is called the “co-Yoneda Lemma” is because the Yoneda Lemma can be equivalently stated as an end formula that A(S)\cong\int_{S':\mathcal C}\mathop{\mathrm{Maps}}(h_S(S'),A(S')) (the right hand side is exactly the set of h_S-shaped elements of A).

So what are generalised coelements a generalisation of?

Recall that generalised elements generalise elements by viewing them as maps \{*\}\to X. Is there an actual “coelement” (of a set) that we could think of generalised coelements as a generalisation of? An answer to this question should have some similar properties to those of elements (of sets, spaces, groups, etc.). The key property, perhaps, is that elements allowed us to completely characterise the behaviour of morphisms in these concrete categories. Of course, there is a formal definition for this kind of property:

Definition 3: An object Z is called a separator if the corepresentable functor h^Z=\mathop{\mathbin{Hom}}_{\mathcal C}(Z,-):\mathcal{C}\to\mathbf{Set} is faithful. In other words, if f,g:X\to Y are morphisms such that f\circ x=g\circ x for every x:Z\to X, then in fact f=g; that is, a morphism is completely determined by its action on Z-shaped elements.

This means our desired “coelement” should be a generalised coelement whose shape is given by a coseparator—that is, a separator in the opposite category. To figure this out, it might help to answer this other pressing question: what really is the category \mathbf{Set}^{\mathrm{op}}?

Surprisingly, we can be more concrete than just saying “it’s \mathbf{Set} with its arrows reversed.” A motivating question to help us out would be: given a function f:X\to Y (of sets), is there a canonical way to reverse it and think of it as a function f^{\mathrm{op}}:Y\to X in the other direction? A knee-jerk instinct would be to say that f^{\mathrm{op}}(y)=x whenever f(x)=y, but this clearly doesn’t define a function—it would only be a function if f were bijective, in which case we find that f^{\mathrm{op}}=f^{-1}. Nonetheless, this turns out to be on the right track. It’s just that we have to make a few adjustments so that it actually works.

f^{\mathrm{op}}(y) may not be well defined for some y\in Y because there may be several (or no) x\in X with f(x)=y. We can fix this and make it a function by taking f^{\mathrm{op}}(y) a subset of X; that is, just say f^{\mathrm{op}}(y) := \{x\in X \mid f(x)=y\}. Now it’s a function f^{\mathrm{op}}:Y\to\mathscr{P}(X) into the power set (set of subsets) of X. We’re not quite done, because this seems a bit asymmetric. Fortunately, we can extend f^{\mathrm{op}} to the power set of Y too: for T\subseteq Y, just define f^{\mathrm{op}}(T) := \{x\in X \mid f(x)\in T\}. This is a well-defined function, and is called the preimage function f^{-1}:\mathscr{P}(Y)\to\mathscr{P}(X).

You can check that \mathrm{id}_X^{-1}=\mathrm{id}_X and f^{-1}\circ g^{-1}=(g\circ f)^{-1}, so this shows that taking power sets defines a functor \mathscr{P} : \mathbf{Set}^{\mathrm{op}}\to\mathbf{Set}. It’s also not hard to see that this functor is faithful. Therefore, \mathbf{Set}^{\mathrm{op}} is equivalent to some category whose objects are “sets that look like power sets” and whose “functions look like preimages”. Therefore, we just need to articulate what kind of structure power sets have (that typical sets don’t), and then we need to articulate how this structure is preserved by preimages:

  • The elements of a power set are sets, so it’s reasonable to think about taking unions and intersection. In fact, we can take arbitrary unions and intersections: if U_i\subseteq X for every i\in I, then \bigcup_{i\in I}U_i\subseteq X and \bigcap_{i\in I}U_i\subseteq X. Moreover, these operations distribute over each other: if U_{i,j}\subseteq X for i\in I and j\in J, then \bigcup_{i\in I}\bigcap_{j\in J}U_{i,j}=\bigcap_{j\in J}\bigcup_{i\in I}U_{i,j}.
    • Regarding preimages, notice that if f:Y\to X, then f^{-1}\left(\bigcup_{i\in I}U_i\right) = \bigcup_{i\in I}f^{-1}(U_i) and f^{-1}\left(\bigcap_{i\in I}U_i\right) = \bigcap_{i\in I}f^{-1}(U_i). In other words, f^{-1} preserves all unions and intersections.
  • Similarly, we can take complements of subsets: if U\subseteq X, then U^c := X\setminus U = \{x\in X \mid x\notin U\} \subseteq X. The key property here is that U^c\cup U = X and U^c\cap U=\varnothing for every U.
    • For preimages, we have that f^{-1}(U^c) = (f^{-1}(U))^c, but this actually follows formally from preservation of unions and intersections (f^{-1}(U^c) \cap f^{-1}(U) = f^{-1}(\varnothing) = \varnothing and likewise for unions), so we don’t really need to pay attention to this.
  • We can partially order subsets with the (\subseteq) relation. Under this relation, the unions are the joins (i.e., coproducts), and the intersections are the meets (i.e., products). Note that preimages preserve this relation: if U\subseteq V, then f^{-1}(U)\subseteq f^{-1}(V).
  • Last thing: the subsets of X are built from the elements of X: that is, every x\in X defines a unique subset \{x\}\subseteq X such that for any U\subseteq X we have U = \bigcup_{x\in U}\{x\}.

This is all the structure we need to describe power sets and preimages: these properties of power sets axiomatise those partially ordered sets that are complete atomic Boolean algebras. Accordingly, define the category \mathbf{CABA} to be the category whose objects are these complete atomic Boolean algebras, and whose morphisms are the monotone functions that preserve all joins and meets (called complete homomorphisms), then we find:

Proposition 4: The power set functor defines an equivalence of categories \mathscr{P}:\mathbf{Set}^{\mathrm{op}}\to\mathbf{CABA}.

The inverse of \mathscr{P} is not too surprising: given a complete atomic Boolean algebra B, let \mathop{\mathrm{atom}}(B)\subset B be its set of atoms. Given a complete homomorphism \varphi:B'\to B, we get a function of atoms \mathop{\mathrm{atom}}(B)\to\mathop{\mathrm{atom}}(B') (in the other direction!) which sends an atom a\in B to the (necessarily unique) atom a'\in B with \varphi(a')=a. The reason a' exists is because \varphi(\top) = \varphi\left(\bigvee_{a'\in\mathop{\mathrm{atom}}(B')}a'\right) = \bigvee_{a'\in\mathop{\mathrm{atom}}(B')}\varphi(a') has to be equal to \top = \bigvee_{a\in\mathop{\mathrm{atom}}(B)}a; such an a' is unique because for a'\neq a'' in B', we have \varphi(a')\wedge\varphi(a'') = \varphi(a'\wedge a'') = \varphi(\bot) = \bot. Therefore, we get the (weak) inverse \mathop{\mathrm{atom}}:\mathbf{CABA}\to\mathbf{Set}^{\mathrm{op}}.

Now, to seal the deal, we need to find a separator for \mathbf{CABA}. Well, we already have a faithful forgetful functor U:\mathbf{CABA}\to\mathbf{Set} that just remembers the elements of a complete atomic Boolean algebra (not just the atoms!), so it would be enough to find a corepresenting object for U. This might not seem like much help, but from Proposition 4, we can instead rephrase this as finding a representing object for the power set functor \mathscr{P}:\mathbf{Set}^{\mathrm{op}}\to\mathbf{Set}—that is, a set Z such that elements of \mathscr{P}(X) correspond to functions X\to Z.

Answer: take Z =  \{0, 1\}. Indeed, given a subset U\subseteq X, we have its indicator (or characteristic) function \chi_U:X\to\{0, 1\} where \chi_U(x) = 1 when x\in U and \chi_U(x)=0 otherwise. Conversely, given a function f:X\to\{0, 1\}, this defines the subset U := \{x\in X \mid f(x)=1\}. There we go—genuine coelements of \mathbf{Set} are the subsets!

Side note: forgetful functor U:\mathbf{CABA}\to\mathbf{Set} actually has a left adjoint given by \mathscr{P}:\mathbf{Set}\to\mathbf{CABA} (except that f:X\to Y in \mathbf{Set} is sent to the direct image \mathscr{P}(X)\to\mathscr{P}(Y), where U\subseteq X is sent to f(U) := \{f(x) \mid x\in U\} \subseteq Y). Because of this adjunction, the corepresenting object for U is immediately determined to be \mathscr P(\{*\}) (see the side note in my post about adjunctions to see why). There are only two subsets of the singleton: \varnothing (corresponding to 0) and \{*\} (corresponding to 1).

With this newfound understanding of coelements as subsets, we can re-describe surjections and disjoint unions of sets. A function f:X\to Y is surjective if and only if its preimage is injective on subsets. Likewise, the disjoint union of X and Y is the set X\sqcup Y whose subsets are given by pairs of subsets U\subseteq X and V\subseteq Y (indeed, S\subseteq X\sqcup Y comes from U := S\cap X and V := S\cap Y).

2 thoughts on “(Co-)Elementary, my dear Watson

Leave a comment