# Reviewing Domain Theory and Semantics

Alright, so because a couple of the projects I’m interested in right now involve a decent amount of domain theory, I need to brush up on the topic a bit. And if I’m going to brush up on a topic, I might as well write about it while I’m studying and post it. So that’s what this series of posts are going to be: my, not comprehensive, review of domain theory and its applications to denotational semantics. This first post is going to be pretty basic and a touch (many touches) handwavy, setting the stage for what we need and why we need it.

First off, and this might be very basic for PL people, let’s consider how to make a mathematical model of the untyped lambda calculus. This means, roughly, that we want to find some mathematical object $L$ to represent the collection of all lambda terms that

• has an operation corresponding to lambda abstraction, i.e. a way to take a “function” $L \to L$ and turn it into an $L$
• has an operation correspond to application, i.e. a way to take an element of $L$ and apply it to another element of $L$

Since this is a model of the pure untyped lambda calculus, then we really just want to model the normalized terms of the lambda calculus, which means that every “element”” of $L$ should be a “lambda abstraction” under the hood. What does that mean? It means that we don’t just have an injection of functions $L \to L$ into $L$ but that this is, in fact, an isomorphism!

If we start with the obvious choice that $L$ should be a set, we’re going to run into a problem quickly. An isomorphism between $L$ and $L \to L$ would imply that the sets $L$ and $L^L$ are the same size. Well for anything set that isn’t a single point, that’s not possible. And if the only model that exists is the trivial 1-point set, then this basically means we have an uninteresting theory that merely has many syntactic ways of writing the same program.

“But wait!”, you might say, “what if we didn’t use sets?”. Well, that’s exactly the solution due orginally, I believe, to Dana Scott (I’m having trouble finding the first source where he presented this). We need a kind of object and a notion of “function” that is restricted enough that we can make the isomorphism happen.

This is what domain theory gives us: a notion of partially-ordered sets with a restrictive enough notion of functions that we can solve “equations” such as $L = L \to L$. Essentially, we have a notion of ordering that we think of as an “information ordering”, i.e. what you can observe about the result of the computation. Imagine that we are modelling a Turing complete language and we have a type Nat that corresponds to natural numbers. The elements of our type are going to be all of the natural numbers plus an element that represents non-termination. Traditionally, this element is described by the symbol $\bot$ and is pronounced “bottom”. Why? Because we define our partial order so that no two numbers are comparable but every number is greater than $\bot$.

How is this an “information order”? Imagine you have a program of type Nat in this Turing complete language and that you run it. While you are waiting for the program to run you know nothing about it. If it terminates, then you will have a number and you will know more than you knew before about the computation. If the program never terminates, you just have to sit there waiting an arbitrarily long time knowing nothing about the value it might return. Even if you give up and halt the program after any finite period of time, in terms of experimental observations you still don’t know if the computation was going to terminate at some point if you just waited a little bit longer. You know nothing about it.

Another important conceptual point about the information order is that all of the observations over time are “coherent” in a sense. Once you have information about the computation, that information isn’t going to be contradicted in the future. To formalize that, we need to introduce the notion of “chain”. A chain is an increasing sequence of elements in the ordered set. In our informal parlance of observations, a chain is a collection of observations over time. The “coherence” we want, then, is that for any chain it must then actually be converging towards some unique value. In other words, that every chain has a least upper bound.

In this sense, any partially ordered set where all chains have least upper bounds, i.e. it is “complete”, can be interpreted as a having an information ordering. If it also has a least element, then it can represent computations that may diverge.

If we have the notion of collections we want to use to represent computations, then what then are the restricted notion of functions that we want? There are two components to this answer:

• our functions must be order preserving, because as you learn more about the input you cannot know less about the output to the function
• our functions must respect the coherence of a chain, which means that it preserves upper bounds

We call such functions continuous and conceptually they correspond to computable functions.

So, armed with our notion of complete partial orders and continuous functions over them, we can create a CPO $L$ such that it is isomorphic to the set (really, another CPO) of continuous functions $L \to L$. The actual construction is generally called the $D_{\infty}$ construction and we’ll cover it in a future post.

That’s all for the super-duper-basic-nearly-mathless introduction to why we study domain theory and how it is used in programming language semantics. Next post in the series, we’ll start doing some more math and probably talk a bit more about CPOs, bases (baseis?), and maybe build up to Scott’s $D_{\infty}$ construction.