# Lecture: Formal Definition of Turing Machines

Now we get to the formal definition of Turing machines. The formal definition of Turing machines is much like the other machines we’ve seen so far: there’s a state diagram, a notion of transition, and other things that can happen during that transition. Let’s describe Turing machines by formal tuples the way we have before, so a Turing machine has:

• a finite set of states $Q$
• an input alphabet $\Sigma$
• a tape alphabet $\Gamma$ which must actually be a superset of $\Sigma$ this time because the only way we get input is off the tape. Since scratch paper can be blank, we also insist that $\sqcup$, the “blank” symbol, also be a part of the alphabet $\Gamma$.
• $\delta : Q \times \Gamma \to Q \times \Gamma \times \{L,R\}$ which we can read, in words, as saying that $\delta$ looks at the current state and the input in the cell of the tape that the reader is currently on, then moves to a new state, writes a new symbol onto to the tape, and then moves the head either left or right. Now, what if we don’t actually want to change the symbol on the tape? In that case, we should just write the same letter that was already there back down to the tape. We just, for simplicity of definition, insist that there be only one case for the type of the function rather than multiple possibilities. Similarly, we insist that we move left or right rather than allowing ourselves to stand still just because it simplifies definitions.
• a start state $q_0$
• an accept state $q_a$
• a reject state $q_r$

Note that for the first time we have an explicit reject and accept state. Huh, that might seem a bit odd actually. We’ll come back to that in a moment. Let’s also note that this is a deterministic definition. That might seem like a step back from the non-deterministic machines we’ve been considering the past few weeks, but the reality is that Turing machines have equal power when deterministic or non-deterministic. The only real difference is once we start talking about the efficiency of the machines, and then the distinction matters greatly. Putting all those issues aside, let’s figure out what it means for a Turing machine to compute. We’ll talk about configurations of Turing machines to do that. A configuration is a combination of a

• the current place of the reader
• the current state
• the state of the tape

Following Sipser a bit, we’ll say that a configuration $C_1$ yields a configuration $C_2$ if the Turing machine steps from $C_1$ to $C_2$ in a single step. Sipser says “can step from $C_1$ to $C_2$ in single step”, but the word can isn’t necessary at the moment since we’re dealing with only deterministic machines. There are no choices in this fascist model. We also should define the start configuration given a particular state of the tape $w$ : $(0 , q_0 , w)$. Here we’ll be using the fact that a state of the tape can also be read as a string that extends rightward from the start of the tape. We’ll also say that an accepting configuration is any triple $(n, q_a, w)$, i.e. one that has the accept state $q_a$ as its current state. We’ll similarly call a rejecting configuration any triple $(n, q_r, w)$.

Finally, we can say that a Turing Machine $M$ accepts a string $w$ when there exists a sequence of configurations $C_0 \ldots C_n$ such that

• $C_0$ is the start configuration for the string $w$
• for every $i$, $C_i$ yields $C_{i+1}$
• $C_n$ is an accepting configuration.

and we define a Turing machine $M$ rejecting $w$ when there exists a squence of configurations $C_0 \ldots C_n$ such that

• $C_0$ is the start configuration for the string $w$
• for every $i$, $C_i$ yields $C_{i+1}$
• $C_n$ is a rejecting configuration.

Okay, cool, we have our notion of computation now. Looking at these definitions, we can see that as soon as we hit an accepting configuration or a rejecting configuration then we’re done. This isn’t like a PDA or NFA where we can be in an accepting state and then move out of the accepting state when attempting to process more input.

Perhaps you find this unsatisfactory: maybe you want there to be some notion of being “done” with the input in order to accept, the same way we could think of our previous machines as having an input buffer it consumes. Well this goes back to the inspiration for Turing machines: working out calculations with pen and paper. Think of taking a midterm: you have space in which you’re performing the work for the problem, and you don’t erase it all before you call it done and hand it in. That’s what we’re doing here with Turing machines. We could require that the tape be blank and the head reset in order to accept, but that would just involve taking ordinary Turing machines and then adding a couple of extra states to handle the cleanup. So let’s just skip all of that and say that accepts are accepts, and rejects are rejects, much like what scripture tells us. (My apologies, but once a southern Christian, always with the bad jokes)

Alright, we can’t avoid the question any longer, can we? Just why exactly do we need both an accept state and a reject state, when we could always have “reject” be the abscence of acceptance before? Let us consider a Turing machine with the following transition function

• $\delta (q_0,a) = (q_0,a,R)$ for all characters in the alphabet $a$

What does this do, in words? Gosh, it looks like it will just ignore the input and move the head to the right, forever. That means it will never halt. What about all that business of saying that computation should be done in finite time? Have I been lying this entire time? Let us say that I have been subtly simplifying a question all along.

I’ve tried to be very careful and say “this machine decides this language\$ the entire time. We’re coming to the distinction between decides and recognizes. So we’ll say that a Turing machine decides a language $L$ when, for any string $w$, the Turing machine will always either accept or reject $w$, which means that it will tell us “yes” or “no” in finite time. All the machines we’ve seen so far are of this “deciding” kind: they say yes or no.

A Turing machine recognizes a language $L$ when, for any string $w$, if $w$ is in the language then the machine will reach an accept state. If $w$ is not in the language on the other hand, it might reject or it might run forever, i.e. have an infinite loop. These are also computable operations and, indeed we need to slightly amend our description of “computability” to say that an operation is computable if it, when given well-formed input, will finish in finite time, using finite rules, and finite data.

You might think that that seems kinda awful: we’ve sullied our nice notion of computation to include non-termination. Well, sadly, the problem is that there are a lot of things that are Turing recognizable but not Turing decidable. Over the next week we’ll see a number of examples of them. Suffice it to say, for now, the idea is that there are many more things you can do computably once you only have to consider well-formed input and just do whatever on badly-formed input. That might seem counter-intuitive, but it’s strange and true and maybe kinda amazing when you get right down to it. Math is weird!

I don’t know if any of you have ever questioned why it’s even possible to write infinite loops in programs, given that you never actually want to do that. (Note that by infinite loop here I mean one that doesn’t “do” anything: an operating system or a server doesn’t count as an “infinite loop” for these purposes, but talking about why that is is a touch beyond the scope of this course.) This distinction between recognizable and decideable is exactly the reason: if you want to be able to describe all possible computable functions, then you have to allow for the possibility of infinite loops. It’s a tradeoff.

Indeed, there are actually languages such as Coq or Agda that don’t allow for infinite loops. They can guarantee that every program will actually terminate, which is a wonderful thing to have for many reasons, but there are some programs that they just can’t express, even if they’re written correctly. (Some more technical people who might be reading this blog might nitpick with that statement, as you can “fake” having all computable functions by using coinduction with a non-termination monad. I’ll admit that that’s super cute and I love that trick but it’s not quite the same thing as the program being a first-class term in the language.)

So, now, we’ll end things for this post and the next post chronologically will probably be all about PDAs, but the next post topic wise will be examples of Turing machines and high-level descriptions of a number of algorithms that you can do with them, and the proof that there are some languages that are not Turing recognizable.