Computation and Hypercomputation MIKE STANNETT Verification and Testing Research Group, Department of Computer Science, Sheffield University, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK; E-mail:
[email protected] Abstract. Does Nature permit the implementation of behaviours that cannot be simulated computationally? We consider the meaning of physical computation in some detail, and present arguments in favour of physical hypercomputation: for example, modern scientific method does not allow the specification of any experiment capable of refuting hypercomputation. We consider the implications of relativistic algorithms capable of solving the (Turing) Halting Problem. We also reject as a fallacy the argument that hypercomputation has no relevance because non-computable values are indistinguishable from sufficiently close computable approximations. In addition to considering the nature of computability relative to any given physical theory, we can consider the relationship between versions of computability corresponding to different models of physics. Deutsch and Penrose have argued on mathematical grounds that quantum computation and Turing computation have equivalent formal power. We suggest this equivalence is invalid when considered from the physical point of view, by highlighting a quantum computational behaviour that cannot meaningfully be considered feasible in the classical universe. Key words: Church–Turing thesis, computability, hypercomputation, recursion, scientific method, super-Turing machine
1. Introduction Does Nature permit the implementation of behaviours that cannot be simulated computationally? Such behaviours, if they exist, are said to be hypercomputational. A hypercomputational system is therefore one that ‘computes’ non-computable behaviours.For a range of recent viewpoints, see (Copeland and Sylvan, 1999; Siegelmann, 1999; Bains and Johnson, 2000; Davies, 2001). This use of the term ‘computation’ in both a physical and a conceptual sense causes a great deal of confusion, so I shall always use the word computation to mean the formal functional behaviour of a conceptual computer program; unless otherwise stated a ‘computer’ is any conceptual device equivalent to a universal Turing machine (UTM). Physical systems will instead be called implementations. Consequently, the word ‘computable’ means ‘computable by Turing-machine’, while an ‘implementable function’ is one that can be implemented by a physical system. If an implementation happens to implement a computable function, I call it computational. Notice that computation and computational apply to different domains: physical systems can be computational, conceptual behaviours can be computations. The statement ‘Nature is computable’ means ‘all implementations are computational’ and the statement [HC] ≡def ‘hypercomputation is feasible’ Minds and Machines 13: 115–153, 2003. © 2003 Kluwer Academic Publishers. Printed in the Netherlands.
116
MIKE STANNETT
means ‘there exists at least one implementable behaviour that is not computational’. It is often taken for granted that physical implementations of formal computers can be built, but this is in fact by no means obvious (it is unclear whether unbounded memory resources can be made available). As part of this investigation, I therefore consider the status of the statement [TC] ≡def “Nature permits the implementation of universal Turing computation”. Knowing that a single behaviour is hypercomputational is potentially useful, but the approach I discuss here is more properly described as the investigation of hypercomputational strategy; I want to construct hypercomputational systems deliberately. Accordingly, I’ll restrict the word machine to mean a deliberately constructible physical system capable of performing physical processes according to deliberate ‘intent’. Implementations of computers tend to use discrete instruction sets, but there is no need to impose this as a general precondition; instructions need not be spatially or temporally discrete, nor need they be finitely specified – they simply need to be physically producible as required. They need not even be explicit. For example, I do not know how to specify the processes involved in radioactive decay (I do not even know whether radioactive decay is a finitely specifiable process), but provided radioactive decay can be used in a mechanistic way, then it can be used both to control, and as a component of, a machine. If a machine can behave in hypercomputational ways, I call it a non-Turing machine. If it can also simulate a universal Turing machine, I call it a super-Turing machine (STM). An STM is more powerful than a universal Turing machine, whereas a non-Turing machine might simply exhibit different potential. (Some authors object to the terms ‘non-Turing’ and ‘super-Turing’ because Turing never denied the feasibility of hypercomputation; I acknowledge the validity of their complaint: the term ‘superTuring machine’ should be understood to mean ‘super-(Turing machine),’ i.e. a system that is provably more powerful than a Turing machine.) My own position is somewhat stronger than [HC]; I hold to the feasibility of building super-Turing machines (which entails both [TC] and [HC]); formally: [ST] ≡def “Nature permits the existence of super-Turing machines” Throughout this paper, I shall present several independent arguments in favour of [HC] (and to a lesser extent [ST]), drawing on ideas from a variety of different disciplines. In Section 2, my analysis of physical computability begins with a discussion of the Church–Turing Thesis [CT]. Many papers that cite [CT] assume that its content and context are familiar to readers, but the evidence suggests otherwise – Copeland (1997, 2000) points out that many discussions of [CT] dramatically misrepresent the Thesis by turning it from a statement about a carefully delimited
COMPUTATION AND HYPERCOMPUTATION
117
class of mechanistic human behaviours into a statement about anything that can ever be computed by any means whatever. Arguments based on misunderstandings are inherently suspect, but this should not deter us from asking how far their proponents have succeeded in uncovering a basic truth, even if only by accident. To what extent does [CT] tell us about the limits of physical computation? In Section 3, I consider the nature of observation and scientific experiment, and argue that no experiment conducted in the Natural Science tradition could ever refute hypercomputation. From a logico-scientific viewpoint, then, [HC] is either true or experimentally undecidable; it cannot be refuted. By negation, the statement ‘Nature is computable’ is either false or experimentally undecidable. In Section 4, I consider the status of [ST] as one moves from classical to modern theories of physics. I summarise Hogarth’s work on relativistic algorithms, and their inherent hypercomputationality. I also consider whether quantum computation has anything to offer. I argue that quantum machines are indeed more powerful than classical machines, even though quantum computers are no more powerful than classical computers. This can be seen as evidence against [TC] (the feasibility of universal Turing computation) in a classical Newtonian universe. In Section 5, I counter the common argument that hypercomputation is essentially irrelevant, because even if hypercomputational values could be implemented we would be unable to distinguish them experimentally from sufficiently good computable approximations. In Section 6, I conclude by summarising my observations, and suggest some questions that might benefit from future research. 2. Hypercomputation and the Church-Turing Thesis What do we mean by computation? For mathematicians and computer scientists this question is approached by breaking computational behaviours down into a relatively small number of basic actions, and asking how these actions can be combined. Attempts to do this in the 1930s resulted in the principle now known as the Church–Turing Thesis [CT], that the numerical functions that can be evaluated by “human clerical labour, working to fixed rule, and without understanding” (Turing, 1945, pp.38–39) are precisely those functions that can be evaluated by computer (i.e., universal Turing-machine). This is an assertion that two sets of functions, one defined mathematically and one physically, are actually the same set of functions. One of the key features of Turing’s machine-based model of computation is the ease with which arguments in support of [CT] can be stated, and (as we illustrate below) these arguments become even easier when we replace Turing’s low-level machines with the modern digital machines with which we are all familiar. It is surprising, therefore, that so many authors appeal to [CT] rather than simply adapting the original arguments to generate their own direct proofs. If one truly believes that [CT] justifies claims like “if a psychological science is possible at all, it must be capable of being expressed in computational terms” (Boden, 1988,
118
MIKE STANNETT
p.259) or “a standard digital computer, given only the right program, a large enough memory and sufficient time, can compute any rule-governed input–output function” (Churchland, 1988, p.105) then there should be little difficulty in adapting Turing’s arguments to demonstrate these claims convincingly from first principles. If a direct proof is not forthcoming, the matter is probably not so clear-cut after all. In the 1930s, a ‘computer’ was a person who performed calculations manually in a mechanistic way, and Turing’s machine-based conceptualisation reflects this: A man provided with paper, pencil, and rubber, and subject to strict discipline, is in effect a universal machine (Turing, 1948, p. 9). These days a ‘computer’ is a box of electronic machinery, so to avoid confusion I shall use the term ‘clerk’ where Turing’s contemporaries might have said ‘computer’. Thinking in terms of modern software systems, which are essentially Turing’s machines ‘made flesh’, it is easy to see why both [CT] and its converse ought to be true. The tasks involved in such chores as handling lengthy calculations manually are rather basic: clerks are given lists of numbers to be added or subtracted as instructed, and their results contribute to the lists of numbers handled by other designated clerks. The process of delegating sub-calculations to other clerks or groups of clerks is carefully controlled, and once a calculation is completed, it is possible to observe that this has happened. Each of the basic operations described here – adding and subtracting numbers, initiating delegated sub-calculations, writing results onto a designated list, and observing that a well-defined “I’ve finished and here’s my result” flag has been hoisted – can easily be simulated in software, so it’s reasonable to conclude that anything clerks can calculate, given that they’re operating in this highly artificial way, can also be calculated by a program with even fairly basic operations at its disposal. Conversely, no matter how high-level the software language used to program a numeric operation, it can be simulated by a lower-level program using only basic operations like addition and subtraction, and such a program can in turn be simulated by human beings pretending to be the computer. This ‘play-acting’ process only requires the ability to operate in the clerks’ mechanistic way, following instructions, but using no intuition or intelligence – so anything the program can do, clerks can do. We call this sort of mechanistic human behaviour ‘effective’, and [CT] asserts that when human beings work in an effective way they are no more than a standard digital computer.1 Therefore, putting the matter rather crudely, [CT] makes the at-first-sight unsurprising claim that [CT]: when human beings pretend to be computers they behave like computers. This statement of [CT] is not entirely accurate, but it does show why the many published descriptions of [CT] as a claim to the effect that ‘any function that can be computed by any means whatsoever can be computed by a Turing-machine’ are nonsense (Turing specifically described behaviours that cannot be simulated by his machines2 ). The fact that [CT] and ‘effectiveness’ are typically applied well beyond the domain of human behaviours is a testament to the power of Turing’s model.
COMPUTATION AND HYPERCOMPUTATION
119
Figure 1. The Church–Turing Thesis [CT].
We’ve seen that the effective behaviours displayed by clerks are precisely the same (in idealised functional terms) as those displayed by modern software systems. Since these software systems can simulate everything from music synthesizers to washing machine control systems and automatic theorem provers, so can effective humans. According to [CT], effective humans behave identically to computers, so the behaviours of synthesizers, washing machine control systems and automatic theorem provers are all computable, and can all be described as ‘effective’. Our description may seem to give the impression that [CT] is simply an analytic statement defining what it means to ‘pretend to be a computer’ — in effect, that it says nothing at all. On the contrary. [CT] asserts two important alignments between the formal domain of conceptual mathematics and the experiential domain of experimental physics (see Figure 1), viz., • humans can, in principle, simulate computers; and • the computational human behaviours are precisely the effective human behaviours. Both aspects of [CT] are important. Computers (Turing’s ‘machines’) are conceptual devices, not physical ones, and the assertion that clerks and computers behave identically is a deep statement to the effect that a relationship can, does, and must exist between the effective physical behaviours of human beings on the one hand, and the mathematical behaviours that underpin formal arithmetic on the other. The converse to [CT] specifically tells us that computers can be implemented, in principle, by play-acting human beings, so it entails [TC].3 The
120
MIKE STANNETT
second aspect, that the set of computational behaviours is in a fundamental sense identical to the set of effective behaviours, is important because it is typically the only means available for demonstrating that a physical process is computational. For example, suppose two perfectly hard billiard balls of equal mass, each moving at exactly 1 m/s, collide with one another head-on in a Newtonian universe, on a frictionless flat table. The simplicity of this system ensures that we can design a software simulation of the collision. Since we’ve already seen that programs can be simulated, in principle, by effectively operating humans, we conclude that simple Newtonian collisions can also be simulated effectively. Invoking [CT] now allows us to deduce that the collision process is computational. The fundamental equivalence of effective behaviours and software behaviours also justifies calling today’s software-driven digital machines ‘computers’ – it is because they are essentially equivalent to effective humans that [CT] can be invoked to declare their behaviours computational. Indeed, [CT] is usually described without reference to humans at all, with effectiveness defined to be ‘computing-machine-like behaviour.’ We ought to say a few more words here about [TC], the claim that a UTM can indeed be built, since the validity of [CT] is linked to the construction of these machines, whether mechanically or in the guise of play-acting humans. The existence of PCs suggests that universal Turing machines can easily be constructed, but this is not in fact obvious, because a true implementation must provide the machine with as much memory as required, for as long as required. If the universe is destined to end in a ‘big crunch’ then the timing requirement cannot be satisfied; if the amount of matter in the universe is finite, then the memory constraint cannot be satisfied. In a strictly Newtonian universe neither of these difficulties arise, but our universe is not currently believed to be Newtonian. Even so, these problems are not necessarily insurmountable, because there is no reason why a successful UTM implementation need operate like a PC or a human. Turing’s model constrains the structure of time, but only partially. UTMs are required to perform one instruction at a time, in strict sequential order, but there is no requirement that every action should last as long as every other. Thomson’s Lamp (Thomson, 1954–1955; McLaughlin, 1997) illustrates one way in which this caveat can be exploited; his lamp is turned on for one minute, then off for half a minute, then on for quarter of a minute, and so on, with each step lasting half the time of its predecessor. The same ‘Zeno’ principle can be applied to the timing of UTMs (Copeland 1998a) to generate ‘accelerating’ UTMs. The operation of such machines satisfies the requirements for universal Turing machines, but they can always complete their tasks in finite time, so even if a ‘big crunch’ is approaching the running of their programs will not be affected. We return to these machines below in our discussion of relativistic computation. For the time being, however, we shall focus on the familiar Newtonian situation, so that [TC] can be taken for granted. In such a universe there is no difficulty in allowing a program to run forever, and memory resources can be provided whenever they are required (for example, a PC can be given an ever-increasing supply of disks as required).
COMPUTATION AND HYPERCOMPUTATION
121
The Church–Turing Thesis can be seen as strong evidence in favour of hypercomputation, because it tells us that implementing a computer is as easy as getting a human to play-act at having no intuition, no volition, no self-awareness, no creativity, no understanding of the problem domain – in short, none of the characteristics we normally associate with being a human being in the first place. In today’s computer-centric society, it is normal to misread statements that compare humans with computers, and one standard misreading interprets [CT] to mean “if a human is like a computer, then humans must be fundamentally simple”. This misreading is essential to many research areas, since there would otherwise be little prospect of simulating human behaviours computationally, and much modern AI and Philosophy of Mind would be based in fantasy. But this is exactly what [CT] does not say. It tells us that computers are fundamentally simple, because humans can do everything computers can, even with their minds (almost) turned off. Indeed, an alternative reading of human-computation relationships – the socio-industrial statement that ‘forcing humans to behave like computers is dehumanising’ – logically depends on humans being hypercomputational. And if universal computation can be achieved by something as simple, in behavioural terms, as an ‘automatic human’, it would be amazing if the wider Universe, with all manner of exotic phenomena and as-yet-undiscovered complex behaviours at its disposal, were able to support nothing more powerful. Yet, this is precisely what people assert when they deny the feasibility of hypercomputation. Of course, they might be right — just because a small component of the universe can simulate computation, it does not follow that larger components must be capable of more. Standard physical theories tend to express behaviours as mathematical equations, and these can be used to generate software simulations. For example, the physical equation stating that light travels in straight lines might be simulated by a graphics routine that turns all pixels in a certain collection yellow, where the collection is asserted to contain precisely those pixels whose coordinates are solutions (x, y) of some equation Line(xmin , xmax , slope, intercept) ≡ “(xmin ≤ x ≤ xmax ) & (y = slope × x+ intercept)”. Since these interpretations seem to ensure that the ‘graphical universe’ is as much a model of the physical theory as the ‘physical universe’ on which it is based, any physical behaviour that can be simulated graphically ought to be computational – whatever can be simulated computationally is computational. Does this mean that physical hypercomputation is necessarily infeasible? Not necessarily, because we have made various unstated assumptions. With regard to this particular ‘graphical’ model, for example, we have ignored the discrete nature of pixels by assuming that our graphics package can draw infinitely thin lines. The yellow line is not an exact interpretation of the equation, but rather an approximation that can be made arbitrarily close to the underlying ‘ideal’ line by reducing the sizes of individual pixels. This is a fundamental problem affecting all claims that physical systems can be modelled computationally: in their original formulation, Turing machines treat both space and time as discrete resources, so
122
MIKE STANNETT
that continuous behaviours have no ‘natural’ representation. Overcoming these difficulties typically requires an amendment to Turing’s original machine model, whereby continuous space and/or time are introduced as ‘primitives’ of the model. For example, Pour-El and Richards (1979, 1981) implicitly assume continuous time in their equational models of hypercomputational systems. Stannett (1990) uses a continuous space-and-time model of computation, the analog X-machine, in which the continuity of time is assumed from the outset (this has since been generalised to cope with arbitrary descriptions of time (Stannett, 2001)). MacLennan’s work (MacLennan, 1990, 1993, 1999) centres on field computation, an innovative approach in which computation is represented as a perturbation of a continuous field. There are also certain logical obstacles to be overcome. It is only if a physical theory is consistent and expressible in terms of computable axioms, and fully explains the given physical behaviour, that our argument applies. By implication, therefore, a physical behaviour can potentially be hypercomputational if one or more of the following assertions holds for the physical theory relative to which it is described: • the theory is not grounded in computable relationships; • the theory does not fully explain the behaviour in question; • the theory is logically inconsistent Which of these situations can arise in practice? • The idea that fundamental physical laws might not be computable is problematic. It is certainly true that physical equations can involve uncomputable operators like general integration, but whether this matters is unclear. Physicists typically use numerical approximation methods whenever computational difficulties arise, and it is unclear whether the true physical theory (the theory that describes the universe) is the one involving ideal (but uncomputable) relationships, or the one embracing computational approximations. Depending on one’s answer to this dichotomy, the idealised version can be regarded in two distinct ways. If the law-as-stated is simply a convenient shorthand description for the physicists’ law-as-used, then its non-computability is immaterial; but if it is regarded as the ‘true’ description of Nature, then the inclusion of uncomputable operators suggests the existence of uncomputable behaviours. • There are many important processes for which no complete explanation is known. No currently established physical theory (I know of) can explain why a radioactive atom should decay at one particular moment rather than another. Perhaps explanations of this particular behaviour will one day be forthcoming, but in any event, the claim that physical theories cannot fully explain behaviours is irrefutable. This is a consequence of indistinguishability (see below) – because we cannot distinguish the ‘true’ process from any sufficiently close approximation, we have no way of deciding (at the time) whether a physical model we’re proposing is exactly accurate, or just a good approximation.
COMPUTATION AND HYPERCOMPUTATION
123
• The third possibility, that physical theories might be inconsistent, is particularly interesting. Consistency is required because we want the physical Universe (or at least the part we are trying to describe) to be a ‘model’ for our theory of physics, i.e., a concrete setting in which the logical relationships of our theory are mirrored in the physical relationships between objects – and standard logical results tell us that inconsistent theories cannot have models. It is unclear whether this consistency can be safely be assumed, because one can argue that the current Standard Model of physics is itself inconsistent4 . Moreover, physical theories typically incorporate arithmetic, and the consistency of arithmetic is itself open to debate. In summary, one can make a case that any behaviour that can be explained fully by consistent physical laws is computational. But there are no grounds for assuming that every physical behaviour can be explained in this way. First, there is no evidence that physical theories are necessarily consistent; second, there are currently behaviours that defy explanation; and thirdly, our inability to distinguish between behaviours that are distinct but ‘sufficiently similar’ means we cannot be certain that the behaviour explained by the theory is the one intended, or simply a nearby approximation. Consequently, it is entirely reasonable to suggest that, whatever system of physical laws we presuppose, there will be physical implementations whose behaviours cannot be fully explained by those laws, and which cannot, therefore, be simulated computationally by systems which take (representations of) those laws as computational primitives. 3. Hypercomputation Is Experimentally Irrefutable At the heart of the Standard Model lies the fundamental assumption that experimental observation is the only meaningful evidence of existence, where observation is itself meaningful only if it satisfies the constraints of ‘scientific method’. In particular, an experimental result is considered truly valid only if it can be replicated by independent parties. By definition, replication of an experiment requires that it be recursively defined – independent scientists should be able to repeat the same experimental steps in the same order, starting from the same carefully prepared initial conditions, and so generate the same results. This concept is only meaningful if the experiment is viewed as a recursive operation to be conducted on a recursively recreated environment. Notice, however, that the individual steps need not necessarily be recursive themselves; for example, one such step might be ‘take a random digit from the generator’. Rather, it is the control structure (i.e., the rules governing how one step follows another) that must be recursively defined. Technically, we are saying that a consequence of scientific method is that any valid ‘scientific’ experiment must implement an O-machine, i.e., a Turing machine with potential access to random number generators or other ‘oracles’ (Copeland, 2000).5 Since the constraints of scientific method amount to a requirement of recursive replicability, the basic framework of physics places implicit faith in the complete-
124
MIKE STANNETT
ness of recursive methods (possibly enhanced by oracles) for determining physical reality. The Standard Model is therefore an intrinsically recursive representation of Nature, and it is not surprising that physical hypercomputation should be so difficult to demonstrate – the very language of modern physics precludes direct analysis of hypercomputational processes. It does not follow, however, that physics has nothing to say about hypercomputation. It means simply that any such statement is likely to rely on subtle arguments, and will probably require a deal of lateral thinking. Notice that even if oracles are used in the conduct of an experiment, we cannot simply use their presence to argue that hypercomputation is feasible, because we first need to prove that they are oracles, i.e., that their behaviour is uncomputable. We cannot simply presuppose that some physical component is capable of producing non-computable behaviour (for example, the successive digits of some uncomputable number), because our overall goal is to establish the feasibility of hypercomputation; presupposing the existence of hypercomputational oracles simply leads to circular reasoning. The material in this section originated in our attempt to prove that one particular type of oracle (a random number generator) can be constructed as a physical device. In order to motivate this example, we need to explain what we mean by a ‘true random number generator’. The nature of randomness is both technically and philosophically complex; see, e.g., (Church, 1940; Copeland, 1940), but for our purposes, we mean loosely that • The system is guaranteed to generate a natural number as output (call it n) after some finite amount of time; and • There are no constraints as to the value n can take. That is, any natural number might potentially be generated by the system, and the output generated by any run of the system is in no way dependent on the outputs generated by earlier (or subsequent) runs. The simple hypercomputational system described here uses a radioactive sample. We are particularly interested in material that decays by ‘α-radiation’, a process in which Helium nuclei are ejected from the nuclei of atoms in the material. The Helium nuclei combine with electrons in their environment to form atoms of Helium gas that can be pumped out of the experimental apparatus, and as a result the sample becomes progressively lighter. In order to generate a random integer, we first examine our weighing scales to determine how much mass has to be lost by the sample for the reduction to be detectable. Then we collect together a large enough amount of sample material to ensure that when all of it has decayed, the weight loss will be easily noticeable. We choose a ‘threshold’ value somewhere between the minimum mass loss detectable by the scales, and the total possible mass loss of the sample, and then set a clock running. The number ‘computed’ by the system is defined to be the number of complete seconds that pass before the system generates the chosen amount of mass loss.
COMPUTATION AND HYPERCOMPUTATION
125
Underpinning the system’s behaviour is the standard assumption (considered by physicists to be valid for all radioactive materials) that there eventually comes a time when half of the sample can be expected to have decayed – the average time required is called the ‘half-life’ of the sample in question. Because the threshold mass is strictly less than the total possible mass, it should eventually be reached after only finitely many half-lives, but because decay is assumed in the Standard Model to occur randomly, the number of seconds that pass before this happens must also be random. In other words, this system implements a true random number generator. Apparently, then, radioactive decay ought to enable the construction of superTuring machines, because it’s well known that true infinite randomness of this type cannot be produced by a computer. König’s Lemma says that the only way a finitely branching tree can contain infinitely many leaves is if it also contains an infinite path.6 Because the choice statements in a computer program (statements using if, while, and the like) only ever involve finitely many choices, the set of possible program behaviour traces constitutes a finitely branching tree. But for a program to implement a true random number generator, it must be guaranteed to terminate, and be capable of doing so in any one of infinitely many different ways. These give infinitely many leaves of the behaviour-trace tree, so by König’s Lemma there is at least one behaviour that corresponds to an infinite path through the tree, i.e. the program is not guaranteed to terminate after all. Consequently, it is impossible for a computer program to simulate a true random number generator. Further reflection shows, however, that the assumption that half the sample will eventually decay is extremely subtle. In trying to express the assumption rigorously we find that two radically different interpretations are possible, and comparing the two interpretations allows us to deduce that: • either the assumption is correct (this is the ‘official’ stance of modern physics), and we can build the true infinite random-number generator described above; so [HC] is correct (and [ST] is also correct provided universal Turing machines can be implemented). • or the assumption was flawed, but in this case, we can show that hypercomputation cannot be refuted by any experiment conducted according to the rules of the Standard Model (i.e., the two concepts are logically independent). Although radioactive decay is a familiar concept in modern physics, the randomness associated with it seems to be defined rather ambiguously. This is disquieting because decay is of great importance to modern physics, as this popular description of the proton illustrates (see Lopez (1996) for a more comprehensive description of the relevance of proton decay): Protons are essential parts of ordinary matter and are stable over periods of billions and even trillions of years. Particle physicists are nevertheless interested in learning whether protons eventually decay, on a time scale of 1033 years or more. This interest derives from current attempts at grand unification theories that would combine all four fundamental interactions of matter in a
126
MIKE STANNETT
single scheme. Many of these attempts entail the ultimate instability of the proton, so research groups at a number of accelerator facilities are conducting tests to detect such decays. No clear evidence has yet been found; possible indications thus far can be interpreted in other ways. [“Proton”, Microsoft Encarta 98 Encyclopaedia. ©1993–1997 Microsoft Corporation. All rights reserved.] Given that protons are key components of every atomic nucleus, it is clear that many grand unification theories should entail the instability of all atomic species. At the present time several hundred elementary particles are known, with halflives ranging from the very long (protons > 1033 years), through the everyday (free neutron ∼ 15 min), to the vanishingly short (Z 0 ∼ 10−25 s). However, what does ‘half-life’ actually mean in this context? The standard definition is straightforward. For example, if we start with one million free neutrons, we would expect that after roughly 15 min (the half-life of a free neutron) only about 500 000 would remain, the rest having decayed. Another 15 min later we would have only 250 000, and 15 min after that only 125 000 (see Figure 2). This notion of “half-life” is so familiar that it is commonly accepted as an unquestioned basis for experimentation. For example, one standard experiment in support of the relativistic notion that time slows down for fast moving objects involves measuring the half-life of cosmic rays. These are fast-moving (often radioactively unstable) particles that rain down upon the Earth, and it is possible to measure how the number of particles changes with height. The lower we take our measurement, the longer the particles will have been in transit and undergoing decay, and the lower their population will be. Experimental evidence confirms that the number of particles drops off much more slowly than would be expected given the amount of time involved (as measured by an observer on the Earth), because the fast-moving particles consider the journey to have taken considerably less time. If we consider the way physicists actually use concepts like ‘half-life’, it is clear that they consider decay to be not merely possible but necessary, but this is nowhere expressed in the standard statistical definitions. I (and clearly most modern physicists) would “expect” roughly half of a neutron sample to have decayed after 15 min, but no one can guarantee it. The possibility always remains, however low the probability, that none of the neutrons will have decayed, even several days later. However, if this actually happened in practice, it is highly unlikely that a physicist would say “aha, there’s Nature reminding us that decay is a purely statistical concept that need not happen at all”. They’re far more likely to say “your counter is obviously broken”. This suggests two distinct scenarios for the decay of unstable particles, one based in formal description, the other in pragmatic usage. The rest of this section, which is quite formal in places, is organised as follows. 1. I state the two different versions of what it means to say that “unstable particles eventually decay”. I show how these two versions can be expressed as logical formulae, φ and ψ.
COMPUTATION AND HYPERCOMPUTATION
127
Figure 2. Decay with constant half-life.
2. I show that φ and ψ are so similar that no experiment can distinguish between them. 3. I show that φ is a tautology of the Standard Model, and that [HC] is a logical consequence of ψ. 4. It follows that hypercomputation is experimentally irrefutable in the Standard Model, because: 5. suppose an experiment refutes [HC] • since [HC] is a consequence of ψ, the experimental system is a counterexample to ψ • since φ and ψ cannot be distinguished experimentally, the experimental system is also a counterexample to φ • since φ is a tautology of the Standard Model, the experiment actually refutes the Standard Model itself. Suppose, then, that observation of an as-yet-undecayed unstable particle p begins now, at time zero. Let’s write E(p, t) to denote the a priori probability that p will still not have decayed at a time t in the future. The concept of half-life gives one particular shape (Figure 2) to the notion that E(p, t) tends towards zero the longer we wait, i.e. E(p, t) → 0 as t → ∞. Nevertheless, we can express this asymptotic behaviour logically in two similar, but significantly distinct, ways. In the following statements, the value is always assumed to be positive. First Interpretation of E(p, t) → 0 as t → ∞ This interpretation says: any given particle p may or may not eventually decay, but the chances of its not having done so become arbitrarily close to zero as time passes. Formally, p satisfies (∀ε)(∃T (ε))(∀t)[ (t > T (ε)) → (E(p, t) < ε) ]
128
MIKE STANNETT
i.e., given any positive probability, ε, no matter how small, there will be some amount of time, T (the exact value may depend on your choice of ε), such that the chances of p still existing at any given time t later than T are less than ε. Second Interpretation of E(p, t) → 0 as t → ∞ This interpretation says: any particle, p, must eventually decay after some finite time T (p), but we do not know in advance what value T (p) will take. We can assume, however, that if we were to watch many particles the various values would be distributed as expected. Formally, p satisfies (∃T (p))(∀t)[ (t > T (p)) → (E(p, t) = 0) ] i.e., there definitely will be some future time, T (which may vary from particle to particle, and which we may not actually be able to calculate), by which the particle has decayed. The symmetry between the two interpretations isn’t immediately obvious, so I’ll apply a mathematical “trick” to help bring out the similarity. In the second interpretation, we’re making the blunt assertion that E(p,t) will be zero, and consequently have no need for the test-parameter “ε” that appears in the first interpretation. We can re-introduce ε by observing that zero can be defined as “that unique non-negative number which is less than every positive number”. Consequently, (∃T (p))(∀t)[ (t > T (p)) → (E(p, t) = 0) ] is the same statement as (∀ε)(∃T (p))(∀t)[ (t > T (p)) → (E(p, t) < ε) ] provided T (p) and E are independent of ε and ε is independent of p. Accordingly, we can write the two interpretations in strikingly similar terms, where ε is again understood to be positive: α(p) ≡def (∀ε)(∃T (p))(∀t)[ (t > T (p)) → (E(p, t) < ε) ] β(p) ≡def (∀ε)(∃T (ε))(∀t)[ (t > T (ε)) → (E(p, t) < ε) ] Seen in this way, the distinction between the two statements lies in the way T is to be chosen. In α we choose T according to the constraint ε, but in β we assume that T is a property of the particle itself. To understand the experimental status of these statements, we need to consider E in more detail. We have assumed that E is the asymptotically zero exponential decay curve of the Standard Model (so that half-life is meaningful). During any experiment to establish the truth or falsity of α or β we need to observe a particle p to see whether or not it has yet decayed; and by doing so we generate new information. In particular, this information forces us to amend our a priori assumptions concerning E. If we find that the particle has indeed
COMPUTATION AND HYPERCOMPUTATION
129
decayed, for example, it is no longer meaningful to assign a positive probability to the possibility that it won’t have decayed 5 s later. Rather, the moment we know that the particle has decayed, all subsequent a priori estimates of E automatically drop to zero. Suppose then that p is a particle, and let’s consider the relationship of p to α and β. There are two cases to consider, depending on whether p is stable or unstable. If we assume that the particle is in fact stable, this assumption itself provides information that forces E to be amended. For a stable particle we have, by definition, E ≡ 1, and now both α and β evaluate to false. On the other hand, if, as physicists increasingly suggest, p is bound to be unstable, we can consider two subcases. Either there is a time tp after which the particle can be seen to have decayed, or there isn’t. If there is such a time, then E has, at some point, to be adjusted so that E(p, t) ≡ 0 for all t > tp . Under these circumstances both α and β automatically evaluate to true. If there is no such time, then no adjustment of E occurs, and we can distinguish α from β, because α evaluates to true but β evaluates to false. To summarise, the only way we can distinguish α from β is by finding a particle which is ostensibly unstable, but which never in fact decays. For pedants this ends the argument, because they can argue (with some justification) that an unstable particle that never decays is in fact a stable particle. However, even if we allow for the existence of a priori unstable particles that never decay, determining that a particle actually possesses this property requires waiting forever, which violates scientific method on two distinct grounds. Firstly, an experimental result is only meaningful if it can be replicated, and clearly, an experiment that runs forever can never be repeated thereafter (see, however, the section on relativistic algorithms, below). Secondly, experiments must be conducted using finite means, and this again includes the requirement that they run to completion in finite time. COROLLARY . α and β are experimentally indistinguishable. Let’s write U nstable(p) to mean that p is an a priori unstable particle. We’ve just seen that α evaluates to true for all ostensibly unstable particles, so the statement φ ≡def U nstable(p) → α(p) is a tautology (provided one accepts the asymptotic decay curve assumed by the Standard Model); indeed, it can be seen as the definition of instability in the Standard Model. Since α and β are experimentally indistinguishable, we can regard the statement ψ ≡def U nstable(p) → β(p) as an experimental tautology. That is, there may exist logical counterexamples to ψ, but any such counterexample is experimentally meaningless. However, this statement ψ is precisely the assumption underpinning our design of a true random
130
MIKE STANNETT
number generator, so ψ necessarily entails [HC]. Because our construction is entirely deliberate, and given our general assumption that UTMs are feasible, [ST] also follows: the output of our random number generator can be used as an oracle to a standard UTM, thereby generating a super-Turing machine. The corollary we set out to prove now follows. We know that φ is a tautology of the Standard Model and that ψ entails [HC]. Since φ and ψ are experimentally indistinguishable, any experimental refutation of [HC] is a refutation of the Standard Model itself. 4. Non-Newtonian Expressions of Hypercomputation The intent of [HC] – that hypercomputation is feasible – usually presupposes a classical Newtonian universe, since this is the type of universe for which Turing computation has traditionally been defined. In recent years, however, there has been much discussion of computation relative to other types of universal model, especially where quantum computation is concerned. The basic idea behind these models is that Turing’s conceptualisation of computing machines is sufficiently physical that one can easily envisage how the classical components can be replaced with quantum counterparts (for example, if quantum uncertainty means the squares on a memory tape can’t be accurately distinguished, then the tape becomes a fuzzy nondeterministic memory device whose contents can be known statistically but not absolutely). An alternative approach is to amend the type of space and time in which the machine operates, and this is essentially the approach taken when developing relativistic algorithms. Finally, we say a few words about ‘X-machine’ models; while these are intrinsically theoretical, they suggest alternative models of ‘effectiveness’ relative to which uncomputable behaviours might actually be ‘computed’. 4.1. R ELATIVISTIC STRATEGIES This section is based on (Hogarth, 1992, 1996; Earman and Norton, 1993; Earman, 1995 and Hogarth, 2001 personal communication). The material in (Earman, 1995) is particularly comprehensive, but presupposes familiarity with some of the basic concepts of relativistic theory. Relativistic models of physics tend to focus on cosmological questions relative to which the existence of any individual object is essentially insignificant, so the theory usually concerns the structure of spacetime rather than the nature of objects. An important issue concerns the topology of spacetime: is it flat (Euclidean) or curved, and does it contain singularities? To understand what these terms mean, it is instructive to consider the old question ‘is the Earth flat?’ We can answer this question by boarding a rocket to view the planet from space, or by taking a journey around the planet and returning to our starting point. But at a cosmological level
COMPUTATION AND HYPERCOMPUTATION
131
these options are unavailable: we cannot take a rocket trip ‘outside the universe’, and journeying across the universe is currently infeasible. We need some other technique for establishing that the Earth is not flat, that can be generalised to the cosmological arena, and the relevant tool is geometry. For example, we can draw lots of circles with diameter d, and measure their circumferences. If these always have length π d, we would conclude that the planet is ‘flat,’ otherwise it is ‘curved’. To see that curvature does effect measurements of this kind, think of the Earth as a sphere of radius r. If we draw a circle centred on the North pole with radius π r, the only point lying on this circle would be the South pole, because this is the only point on the Earth’s surface that can be reached by walking from the North pole in a straight line (i.e., along a great-circle) for a distance of π r. On this curved surface, therefore, we find that the circumference of any circle of radius π r is zero. This anomaly proves that the surface of the Earth is not flat. Now construct a ‘cosmic circle’ whose radius is a sizeable fraction of the radius of the universe. If we measure the ratio of its circumference to its diameter, will the answer be π ? According to relativistic theory, the answer is a resounding no, so by analogy with the Earth, spacetime is said to be ‘curved’. The curvature of spacetime (which typically varies from point to point) measures how ‘bent’ spacetime actually is. In some cases spacetime becomes so ‘bent’ that it actually develops a tear (this is why a black hole is called a hole) and in such situations we say that ‘the field contains a singularity’.7 In order to specify a model of spacetime, all that’s required is a rule (called a metric) for measuring distances, for once we know how to measure distances we can establish just how bent spacetime is at each point. Every metric defines its own version of spacetime, so there are lots of possible descriptions of what spacetime looks like. However, not all of them are considered ‘realistic’. The guiding principle is that the spacetime (or if you like, the metric) should satisfy a collection of fundamental equations, called the Einstein Field Equations (EFE). Given a metric, it is said to define a Malament–Hogarth (M-H) spacetime if the spacetime it describes contains a path γ and a point P with the following properties8 (Earman, 1995, p. 107): • The path γ has a definite starting point. • As you move along γ you are always moving into the future. • Anyone who traverses γ considers their journey to take infinitely long (in effect, γ has no end-point; as you move along γ you are actually falling into a ‘singularity’) • At any point along γ it is possible to transmit a signal to P. It is important to understand that P is a point, not in space but in spacetime. If we write P ≡ (x, t), then being at P means being at position x at time t. Moreover, because signals (which always travel forwards in time) can be sent to P from each point of γ , any observer at P considers the whole of the computer’s journey along γ to be in the past.
132
MIKE STANNETT
From a theoretical viewpoint, M-H spacetimes allow the construction of hypercomputational behaviours, called supertasks by (Earman and Norton, 1993; Earman, 1995). Suppose, for example, that we are given some arbitrary computer program, prog, and need to determine whether or not it will eventually halt. In general this problem is known to be undecidable, but it can be decided (in principle) if we perform the computation in a M-H spacetime. We set the program running on a wheeled computer that is primed to travel along the path γ , and add the extra instruction that, should the program halt, a signal should be sent to P saying so. We then travel to P by some other route that avoids the ‘singularity’ at the end of γ ; this can be shown to be possible, and moreover, the traveller considers the journey to require only a finite amount of time. If, on arriving at P, we find that no signal has been received, this must be because no signal was sent (remember, the observer at P considers the whole of the computer’s journey along γ to be in the past). In other words, we know that the program never halted. On the other hand, if a signal is present, then the program did halt. In other words, we have successfully decided whether or not our arbitrarily chosen program halts. M-H spacetimes also provide an implementation for accelerating UTMs (recall, these are machines for which the execution time for successive instructions decreases so rapidly that the execution time of the entire program is finite). All one needs to do is set the relevant program running on a standard machine that moves along γ , and from the standpoint of the observer at P, the implementation will seem to be operating in essentially the required manner. There are, however, several obstacles to regarding M-H spacetimes as providing for realistic hypercomputers. How, for example, does the observer know the route taken by γ ? The computer in our example was required to follow this path, so it is clearly necessary that γ (or rules for determining γ ) be known in advance. This is not necessarily impossible. There may exist some predictable procedure in an M-H spacetime that would trigger the creation of just such a path, just as material in a massive star might be predicted to follow a path into the black hole created when the star eventually collapses. Earman (1995) notes that while many M-H spacetimes fail the Einstein Field Equations, two M-H spacetimes, known as anti-de Sitter spacetime and ReissnerNordström spacetime, do satisfy the equations and can be considered realistic. However, other problems arise. I observed above that the observer at P considers the computer to be an accelerating machine. Earman argues that, as a result of this acceleration, the later a signal is sent, the more amplified it will become, so that the observer would eventually be vaporised by its energy content; he further shows that various strategies for overcoming this problem have their own attendant difficulties. However, the systems Earman considers are designed to send infinitely many signals, whereas the system I have described above sends at most one signal, a scenario that appears to overcome the difficulties raised (but this requires further investigation).
COMPUTATION AND HYPERCOMPUTATION
133
To summarise, it appears that some M-H spacetimes can be considered realistic, but it is unclear whether machines could ever be constructed to take practical advantage of their properties. So while the possibility of M-H spacetimes again suggests that hypercomputation is feasible in theory, it remains unclear whether it is possible in practice. 4.2. Q UANTUM MACHINES VS QUANTUM COMPUTERS Quantum computation (Benioff, 1980; Deutsch, 1985; Feynman, 1986; Shor, 1994; Steane, 1998) is seen by many in computer science and beyond as a likely basis for the next major development in computer power, and the device described here is essentially quantum computational. Deutsch and Penrose9 have argued that the quantum mechanical analog of Turing computation (normally referred to simply as ‘quantum computation’) cannot achieve non-Turing results, but only has the potential to produce computable results more quickly than standard computers. When one moves to the physical domain, this claim is no longer valid; I contend that one can implement behaviours in a quantum universe that are not implementable classically. In other words, even though quantum computers are equivalent to classical computers, quantum machines are not equivalent to classical machines. The behaviour of the device I describe below is certainly computational – it can be simulated by a nondeterministic Turing machine – but its behaviour cannot necessarily be implemented without recourse to quantum components. In other words, there arguably exists a Turing-computable behaviour that cannot meaningfully be considered to be implementable (which can be seen as evidence against [TC], the feasibility of constructing a UTM, itself) in a purely classical universe. As in Section 3, the proposed device uses quantum behaviours to generate provably random integers, but in this case, the output is always generated after a pre-specified period. Neither the design nor the operating manual for the device make any reference to quantum theory, and all user-manipulations of the device are purely classical (the user essentially flips a switch to run the device). The role of quantum theory does not lie in the construction of the system, but in ensuring the essential randomness of its output. I argued above that no Turingmachine can behave as a true random number generator, but that argument relied on the requirement that a true random number generator should be able to produce any one of infinitely many different values. In the present case we are required to choose a value at random from a finite selection, so that argument cannot be applied. Church (1940) argued that for a sequence to be random there must be no Turing machine able to produce the sequence from a blank tape (Copeland, 2000). I agree this a necessary condition for the following reasons. Randomness only means something when considered in relation to behaviours (equivalently, processes), so that ‘what is randomness’ really means ‘what does it mean for a behaviour to be random’. According to this argument, terms like ‘random number’ aren’t meaningful; it’s not the number that’s random, but the process used to generate it.
134
MIKE STANNETT
Focussing on processes makes sense. If you toss a coin a trillion times the results may well be random. But given your results, I can then write a computer program that exactly replicates them. I’m producing the same results as you, but obviously they’re not random when I produce them, even though they may have been when you did so. To decide what it means for a behaviour to be random, it’s easier to ask what it means for a behaviour to be non-random. If a process is not random, then presumably there’s some sort of specifiable rule governing its behaviour (in the example above, my program was following the rule ‘look up the value on the table in front of you and replicate it’). So a random process is one that follows no rules.10 In other words, if a process (or the sequence of values it generates) is to be deemed ‘random’ it cannot be simulated by any computer program. Church’s condition is therefore necessary, and the existence of random processes entails [HC], but its sufficiency is open to debate. If non-computability were also sufficient for randomness, then all hypercomputational processes would be deemed random. While I agree that hypercomputational processes are intrinsically ‘uncontrolled’ (to the extent that control is a recursive concept) this is not the same as saying they are random. A random process would presumably be capable of generating different ‘answers’ on different runs, but it is not obvious why a hypercomputational process could not consistently generate the same ‘answer’ every time. Whether true randomness can be asserted of a classical system is consequently dependent on whether Newtonian physics allows hypercomputational behaviours. If not, then true randomness is impossible. Indeed, it can be argued that true randomness can never be safely asserted of any classical system (even allowing for hypercomputational systems), because an apparently random system (for example, the tossed coin) is classically regarded as one whose description is simply incomplete. From the classical viewpoint, we are free to refine our description until it contains enough information for correct a priori determinations of system output to be made.11 Following standard Newtonian analysis, if we knew the relevant details of the coin’s position and the way in which it is launched, we presumably could, and that with certainty, establish ahead of time what the outcome of the toss would be (there are, no doubt, Las Vegas regulars capable of tossing fair coins ‘randomly’ to generate any pre-specified pattern of heads and tails). In contrast, randomness is an inevitable consequence of quantum uncertainty, and formulating the future behaviour of our device is quantum theoretically impossible, even in principle. Our device (Figure 3) uses two standard components arranged in sequence. The first, (A), generates circularly polarised photons (for definiteness, they are always polarised clockwise). The second, (B), determines whether or not photons have a specific linear polarisation (for definiteness we always check for vertical polarisation). If dedicated components are not to hand we can make (A) by passing low intensity light through some appropriately cut calcite crystal. The low intensity ensures that photons can be considered to be generated one at a time, and the calcite imparts circular polarisation. To make (B) we can place a light amplifier behind a vertical grating (or a second piece of calcite, this time cut for vertical polarisation).
COMPUTATION AND HYPERCOMPUTATION
135
Figure 3. A hypercomputational quantum device
Operating the device is also simple. We first calibrate the system by calculating the mean time, t, between photons being generated at A. If necessary, we adjust the intensity of our light source to ensure that t is long enough to let us complete the measurement process described next. To generate a random bit we switch on both the source at A and the detector at B for t s. If a photon is detected at B during this period the system has generated a 1, and otherwise a 0. Clearly, this system is guaranteed to generate either a 0 or a 1 after t s, and by repeating the process, we can generate bit strings of arbitrary length. To generate a random number between 0 and (2n − 1) we would run the device n times to generate the value as an n-bit binary value. The output generated at B is truly random, and 0 and 1 each have approximately a 50% chance of being generated during each run. The reason for this is well known to quantum physicists (see e.g. Feynman, et al, 1965), but we include it for completeness. Consider what happens during a single run of the process. Because the run lasts for t seconds, we would expect, on average, precisely one photon to be generated at A. This leaves A in a circular polarisation state |clockwise. This polarisation state can be re-expressed in normalised terms relative to a basis of vertical √ (V) and horizontal (H) linear polarisation states |clockwise = (|V +|H )/ 2. √ as So the probability of acceptance at D is (1/ 2)2 = 1/2. In practice our calibration value t will be inaccurate, say t = (1+ε)tm , where tm is the true mean. This would cause, on average, ε extra photons to pass through the device on each run. Every 1/ε runs, then, we expect two photons to pass through the device during the same run, and this increases the likelihood of a 1 being recorded on that run from 1/2 to 3/4, because we only need one of the two photons to be transmitted by the grating at B. Provided |ε| is small we can ignore the possibility of three or more photons passing through the device during the same run. To first order, then, the actual probability of the system generating a 1 during any run is (1/2 + ε/4). By tuning our calibration techniques this probability can easily be bounded away from both 0 and 1 (so the behaviour remains random, albeit with slightly offset mean) and can be made to approach a true 50/50 split by driving down the systematic error |ε| (or by improving our control over the intensity of the generating light source). In principle, there is no reason why devices of this, or some essentially equivalent, construction could not be included in the physical computing systems in
136
MIKE STANNETT
use today. Calcite is essentially stable at standard temperatures, low intensity light sources would cause little drain of system resources, and such detectors typically generate their output electrically as a matter of course. Provided low-power detectors can be installed, the system would not compromise the power provisions of a standard PC. The result, of course, is a physical hypercomputer. 4.3. X- MACHINE MODELS The X-machine is a theoretical model of computation which allows complex behaviours to be modelled in terms of fairly simple theoretical machines. The original model (Eilenberg, 1974) is based on the finite state machine (FSM), generally regarded as the simplest and least powerful model of computation. However, whereas the symbols that label transitions in an FSM are assumed to have no internal structure, the X-machine approach regards them as names for operations. The ‘X’ in ‘X-machine’ refers to the type of the system that is being manipulated by these operations, so that a calculator might be modelled as a Number-machine or an air-conditioning unit as a Temperature-machine. This provides a very elegant model in which the control structure of any ‘program’ is always an FSM (and hence easily understood and manipulated), but whose computational power is potentially greater than that of the Turing machine (Stannett, 2001). These models are becoming increasingly important in the field of software testing, because they allow a previously unattainable level of certainty to be achieved. Provided certain minimal conditions are satisfied, it is possible to use an X-machine system specification to generate a ‘finite and complete test-set’; in other words, a finite set of input strings with the property that, if an alleged implementation behaves in the same way as the specification for each of these finitely many input strings, then it is guaranteed to behave identically on every possible input string (Holcombe and Ipate, 1998). In its standard form, the X-machine behaves sequentially, performing one instruction at a time in strict sequence. Because this makes the model unsuitable for concurrent or continuous-time behaviours, a number of extensions to the Xmachine have been proposed. Stannett (1990, 1991) introduced the analog X-machine (AXM) in which operations are allowed to unfold continuously in time, and used the AXM to generate a solution to the Turing-machine Halting Problem (the paper introduced a technique for encoding the discrete behaviour of a Turing machine in the analog behaviour of an AXM). Schönegge (personal communication) notes an error in one of the later results in this paper, but it is unclear to what extent this affects the validity of the main results. This issue has been addressed in (Stannett, 2001), where the model is made more rigorous and extended still further to allow for arbitrary models of time; a number of solutions are presented for ‘uncomputable’ tasks like general (Lebesgue) integration. This extended model (the Timed X-machine, or TXM) allows systems to be modelled that use discrete
COMPUTATION AND HYPERCOMPUTATION
137
time for one phase of operations and continuous time for others, and which combine quantum computation with standard computation, and can be used to model the operations typically encountered in analog computation. This allows unified models to be given of hypercomputational systems based on Pour-El and Richards’ differential-equation systems (Pour-El and Richards, 1979, 1981) and Myhill’s recursive function with nonrecursive derivative (Myhill, 1971). These systems can reasonably be regarded as effective. In each case, the key property of the system is its ability to model continuous operations, coupled with the ability to ‘encode’ discrete behaviours in analog terms. However, neither the AXM nor the TXM can truly be considered suitable for modelling concurrent systems; accordingly, (Stannett (2002)) takes a more foundational approach, asking what it means for something to be a ‘behaviour’. This allows a generalised model to be built which includes the X-machine as an instance, but which also offers scope for modelling concurrent systems. Concurrency does not as yet seem to have been considered in detail as a possible source of hypercomputation, perhaps because of the prevailing assumption that concurrent systems can always be implemented sequentially (this is why a concurrent program can be run on a PC with only a single processor). However, concurrent systems are known to possess subtleties that are not present in sequential systems (Milner, 1989). For example, two programs can implement the same function, and yet behave differently when run in parallel with some third program.12 In standard terms the two programs have identical behaviours, but in concurrent terms their behaviours are very different. Moreover, it is commonplace in concurrency theory to consider and compare the behaviours of systems which are intended to run forever, so that non-termination does not disqualify a system from having meaningful behaviour. Consequently the link between programs and functions that lies at the heart of recursive function theory needs updating where concurrency is concerned, and this may affect our understanding of what it means for a behaviour to be ‘computational’. 5. The Indistinguishability of Hypercomputation It is often asserted that non-computable values cannot have any physical meaning, for no matter how carefully you measure a length (say), the result will always be indistinguishable from a sufficiently close computable value, and values which cannot be distinguished experimentally cannot be assigned any physical interpretation. 5.1. T HE STRUCTURE OF THIS SECTION In this section I consider the status of this particular argument against hypercomputation, which I call the ‘indistinguishability fallacy’ or the ‘unobservables are irrel-
138
MIKE STANNETT
evant’ argument. As it stands, the argument seems at first sight to be irrefutable, but it can be reframed in a way that shows it to be rather more questionable. I will show that, if this argument is allowed to stand, then a number of untenable conclusions can be reached concerning rational and irrational numbers. To this end, observe that replacing the words ‘computable’ with ‘rational’ and ‘non-computable’ with ‘irrational’ in the above argument gives the assertion: irrational values have no physical relevance, for no matter how carefully you measure a length (say), the result will always be indistinguishable from a sufficiently close rational value. This statement ought to be true if and only if the original is true, so most of this section will address this statement, as it discusses concepts (rational and irrational numbers) that will be more familiar to readers. I will argue that, as a statement about the relevance of irrational values, this statement is wholly untenable, thereby showing that the argument of the original statement (about computable and noncomputable values) is without logical force. My argument is fairly extended. I start by showing the historical relevance of the indistinguishability fallacy. Certainly, some authors in the 1920s considered it necessary to defend the use of irrationals, having been challenged about their relevance on precisely these grounds. Although I find Hobson’s defence of irrationals unconvincing (Hobson, 1923; see below), the fact that he considered it necessary to give a defence at all shows the seriousness with which the problem was taken. Hobson bases his defence on the mathematical role played by irrationals, and this makes it unusable for our purposes. We are interested, not in the mathematical, but in the physical relevance of irrationals. Accordingly, I ask whether one could meaningfully consider a universe in which irrational values do not arise; one in which all distances (say) are rational. I conclude that such a world would be rather unusual from our point of view, but not ‘obviously’ impossible. So, maybe irrational values don’t have a basis in reality? I now step back and consider precisely what it is we’re trying to prove. I show that, if being indistinguishable from sufficiently good approximations makes a value irrelevant, then even the rational numbers are irrelevant. To decide if a thing is relevant, one must first ask what it is used for. I argue that the relevance of a value lies, not always in whether it can be generated as the output of a process, but more generally in whether it can be used as the input to a process. Moreover, we have so far only considered relevance in terms of measurement, but I argue that measurement has very little to do with computation: one can even do computation with values that are wholly unobservable. Next I look at the relationship between rationality and computability. On what basis do we consider values to be rational in the first place, and why do we consider some numbers to be computable even √ if they’re not rational? To answer this question I focus on the particular value 2, which is both computable and irrational. I √ argue that standard definitions describe what it means for 2 to be computable as a function, not as a number, so I propose a definition for ‘computability as a number’.
COMPUTATION AND HYPERCOMPUTATION
139
I argue that there is a sense in which a number can be deemed ‘rational’ if and only if it can be represented by a recurring string of symbols relative in some suitable √ ‘representational scheme’. From this point of view, 2 is entirely rational, because it has a recurring representation as a ‘continued fraction’. In fact, I show the existence of a representational scheme in which every value that is computable in the traditional sense is an ‘integer’, so that ‘rationality’ and ‘computability as a function’ (the standard version of computability for real numbers) are intimately related to one another. This suggests that the standard notion of ‘computable numbers’ is too restrictive and I confirm this by showing that values can be computableas-numbers even if they are not computable-as-functions. In fact I show that any instance of the ‘Halting Number’ (Copeland, 1998a) is just such a value. This is a key result, because it shows that values exist which can meaningfully be considered computable, even though they don’t satisfy the standard definition of computability. This seems paradoxical, but reflects a different understanding of what it means for a value to be the ‘result’ of a computation. 5.2. I RRATIONAL IMPLEMENTATIONS The indistinguishability fallacy states that irrational numbers are physically meaningless because they can be approximated arbitrarily closely by rationals. But can √ this argument really be sustained, given that irrational values like 2 and 2π can be constructed physically as the diagonal of a unit square and the circumference of a unit circle, respectively? Surprisingly, the status of irrationals was indeed the subject of debate in the years before computability. Writing in 1923, Hobson comments: It has been charged against Mathematicians that, in setting up such a scheme as the arithmetic continuum, they have introduced an unnecessary complication, in view of the fact that rational numbers suffice for the representation, to any required degree of approximation, of all sensibly continuous actual magnitudes; that in fact an instrument has been created of an unnecessary degree of fineness for the purposes to which it is to be applied. The answer to the charge is that Mathematical Analysis ... would become unworkable as a conceptual scheme, or at least much more cumbrous, if the conception of irrational numbers were excluded from it. The results of operations involving rational numbers constantly lead to irrational numbers, without which the operations would be impossible if their effects are to be regarded as definite. But in order to appreciate the full weight of this answer it is necessary to consider the great generalization of Arithmetic which is made when variables are introduced which denote unspecified numbers ... The result of an algebraic operation, expressed by general formulae ... would not always be interpretable when special numerical values are assigned to the symbols, if the only admissible numbers were rational ones ... Without the employment of the conception of irrational numbers the functions of Mathematical Analysis would be degraded to that
140
MIKE STANNETT
of determining only approximate results of operations it employs, and in consequence its technique would have indefinitely greater complication, of such a character that, at least in its more abstruse operations, it would break down, or lead to results which contained a margin of error difficult to estimate. I find Hobson’s defence of irrationals unconvincing, since it is unclear why a desire on the part of mathematicians to make their lives easier should be construed as a reason why Nature should have any use for irrational numbers. It might be argued against irrationals, for example, that such a thing as a unit square is itself unconstructible, since it is impossible in practice √ to construct exact squares, and that in consequence no ‘true’ diagonal of length 2 can in fact exist. Might it be possible that all ‘true’ distances are actually rational multiples of some fixed unit, and that the irrational values we posit have no ‘real’ existence? If so, this would require an overhaul of standard ways of thinking, because it would no longer be meaningful to consider ‘continuous’ motion in the usual sense – if irrational distances have no physical meaning, then at no point during a journey from ‘here’ to ‘there’ would we ever have covered an irrational distance, and likewise when a balloon rises from sea-level to 1000 m, it would not occupy all intervening heights, but only rational heights. This view of the world is at odds with modern academic preferences, but such a model is not ‘obviously’ impossible. The mathematical discipline of general topology often concerns itself with describing and analysing the nature of continuity in spaces that cannot be considered “continuous” in any everyday sense of the word. This argument applies equally to the uncomputable irrationals: if we declare uncomputable values to have no physical relevance, we cannot meaningfully consider ‘continuous’ motions in the standard sense of the term. We need to examine what we mean when we associate values with physical √ constructions. I claim that the statement “this line has length 2” has exactly the same degree of meaning as a statement of the form “this pile contains six objects”. Goodstein (1964, pp. 8–9) suggests that the process of counting consists in a transformation from one number notation to another by means of the rules “one and one is two”, “two and one is three”, “three and one is four” and so on. It is the recitation of these rules (in an abbreviated form in which each “and one” is omitted, or replaced by pointing to, or touching, the object counted) which gives rise to the illusion that in counting we are associating a number with each of the elements counted, whereas we are in fact making a translation from the notion in which the number signs are “one”,“one and one”, “one and one and one”, and so on, to the notation in which the signs are “one”, “two”, “three”, and so on ... [C]ounting does not discover the number of a collection but transforms the numeral which the collection itself instances from one notation to another. To say that any collection has a number sign is just to say that any collection may be used as a number sign. By extension, when we say that a given construction yields a line two metres long (say), we do not mean that a measurement of the line would reveal it to be exactly
COMPUTATION AND HYPERCOMPUTATION
141
twice the length of a standard metre, but that the line can be used as a number sign representing the value 2, as part of the same numerical system in which the standard metre is taken to be a number sign for the value 1. Likewise, when I √ construct the diagonal of a unit square, and say that it has length 2, I am really saying that, as part of the number system in which the sides of the figure I have drawn may be taken to be number signs for the value 1, the diagonal√may itself be taken to be number sign, one for which the value represented is 2. (More accurately, √ I am saying that a set of such diagonals can be regarded as a number sign for 2. There may be many putative diagonals of the same square, depending, for example, on the accuracy with which they are constructed, just as there may be many different ways of writing the lower-case Latin letter ‘a’. The exact nature of this set is open to debate. Depending on one’s handwriting, there may be some argument as to whether a given scribble is an instance of ‘a’ √ or ‘d’; likewise there may be disagreements as to whether a given line represents 2 or some other value. But this does not discount the potential role of diagonals as number signs any more than conflicts over scribbling discount the use of handwriting.) Likewise, hypercomputational values can be used as meaningful descriptions of physical extents. When we say that a given construction generates a hypercomputational length, we are not suggesting that exact measurement of the length either can or should take place, but that, from the point of view of that system in which our unit length itself acquires meaning, the constructed length is a number sign for a non-computable value. 5.3. R ATIONALS ARE ALSO IRRELEVANT... I contend that the argument “non-computable numbers cannot be observed and so need not be considered” is not an argument against hypercomputation at all, but about the role of approximation in the measurement process. This becomes clearer if we consider that the ‘unobservables are irrelevant’√argument actually applies to rational numbers as well. Just as we can approximate 2 with ever closer rationals, so we can approximate the value 1/2 (say) with√rationals from the sequence 1/2 – (1/2)n , and any argument to the effect that “ 2 cannot be distinguished from sufficiently close approximations” √ applies with equal force to the rational value 1/2. If indistinguishability renders 2 meaningless, it also renders all rational numbers meaningless as well. The only way to overturn this argument seems to be to assume that all numbers are ‘isolated’, i.e. cannot be approximated by sequences of other numbers. This effectively means restricting attention to integers, and banning all other numbers from consideration. Furthermore, the measurement of distinguishable values has very little to do with physical processing. What matters in physical computation is not always whether an output can be measured, but whether it can be used as the input to another process. One can, for example, generate an analog model of blood flow by connecting together various fluid-filled pipes, valves and pumps, and one can
142
MIKE STANNETT
also build a machine that responds to the pressure of fluid in a pipe by adjusting the height of mercury in a vertical tube; this then causes the brightness of a lamp, which uses the mercury column as part of its circuitry, to vary. If I connect this lighting device to one of the pipes in my blood-flow model, I have effectively used blood pressure as a continuously varying output from the flow-model, and supplied it as an analog input to the lamp, but at no point have I attempted to generate an accurate measurement of the pressure itself. I could, no doubt, have written down the pressure in the tube, and then supplied details of my measurement to an input driver on the lamp, but this would have entailed expressing the value as a finite decimal, with the introduction of wholly avoidable measurement inaccuracies. By connecting the two analog systems directly, I allow accurate transmission of the required value from one system to the other, in a way that is impossible if I enforce a policy that all inputs must be expressed numerically. Measurement, I suggest, has little to do with computation. This has been made especially clear by recent successes in practical quantum teleportation, where an entity at A comes to reside at B without moving bodily between the two locations (Bennett, 1993, 1995; Steane, 1998). This manipulation, essentially a physical representation of the basic assignment B = A, specifically requires that the data supplied for processing (the inputs) are not observable, since observation would scramble the required quantum information and prevent reconstruction at B. It may seem strange to declare a value computable only if it is not observable, but this merely confirms the shortcomings of standard terminology when dealing with non-classical modes of computation. Nonetheless, it is a fact that many computational systems today are essentially Newtonian and involve measurement, so it is instructive to consider why √ the value 2 is meaningful in Newtonian measurement-based systems. Analysis of this question throws light on the question why hypercomputational values can also be meaningful under these circumstances. 5.4. C OMPUTABLE NUMBERS √ The value 2 is normally said to be computable because one can calculate any digit in its decimal expansion in a finite amount of time, but this explanation is unsatisfactory from a physical point of view because the decimal expansion as a whole√obviously cannot be completed in finite time. A more finitistic rationale √ behind 2’s computability would be the observation that we can regard 2 as a function root2 mapping each natural number n = 0, 1, 2, ... to the n’th significant digit of its binary expansion. Given any choice of n, we can use standard numerical approximation methods to evaluate the associated digit, root2(n), in finite time, so √ we can say that 2 is computable as a function. This is accurate, but not particularly satisfying, since one does not normally think of numbers as functions. It is useful to ask what we mean by decimal expansions, since these seem to play such an important role in the definition of ‘computable numbers’. Such an expansion is simply a stream, x, of digits, which acquires meaning when we supply those digits to an algorithm. For example, decimal expansions of the form x ≡ (x1 , x2 , x3 , ...)
COMPUTATION AND HYPERCOMPUTATION
143
acquire meaning as values in the range 0 ≤ x ≤ 1 by being regarded as input streams to the program DECIMAL(x) described here: DECIMAL(x) = { let result = 0; let n = 1; repeat(forever) { result = result + xn /10n ; n = n+1; } } This program operates on an infinitely long string, x, of decimal digits and uses it to compute the value x = 0.x1 x2 x3 ... The variable result is used to hold successive approximations to x, and is considered to be visible outside the program at all times, so that its value can be examined at any point during computation. The program performs an infinite loop, in which result is repeatedly incremented by the value of the current digit divided by the appropriate power of 10 (the variable n tells the program which digit it is currently working with, and which power of 10 to divide it by). Notice the special role being played by the variable ‘result’ in this (and subsequent) algorithms. This variable is used to hold successive approximations to the value associated with the string x. This particular program will stabilise (i.e., the value of result will become fixed) if and only if the stream x comprises only 0s from some point onwards. As is usual in daily parlance, if x becomes a string of 0s at some point, we’ll suppress those digits and regard the stream as a finite expansion; provided some mechanism is provided for establishing that all subsequent digits are 0, this can easily be taken into account by adding a suitable ‘get-out clause’ √ to the program’s infinite loop. For numbers like 1/3 and 2 which have infinite decimal expansions, the ongoing value of result is only ever an estimate of the true value. Henceforth, we will always think of programs as being equipped with a globally visible result variable (akin to the tape of a Turing machine) which is updated as the program runs. Provided the sequence of values taken by result converges to a definite real value we will say that the program ‘computes the value as a number’. There are many ways to represent numbers, and no obvious reason why the standard DECIMAL algorithm should be considered more appropriate than any other. It is well known that a real number is rational if and only if it has a repeating decimal expansion. In general, if A is some algorithm for calculating real numbers and x a stream of symbols to be supplied as inputs for A, we can argue that the number A(x) generated by running A on x is ‘rational’ with respect to A if and only if the representation x eventually starts repeating. For example, we may decide to represent numbers as continued fractions. This is a representation in which the notation x = [a, b, c, d, ...] means that x can be represented as the output of the calculation
144
MIKE STANNETT
x=
a+
1 b+
1 c+
1 d + ...
√ Relative to this representation 2√can no longer be considered ‘irrational’, as it has the simple repeating expansion 2 = [1, 2,√ 2, 2, 2, 2, ...]. Recurring expressions of this kind can be written in the finite form 2 = [1; 2], where the digits following the semicolon represent the recurring sequence. This form of finite numerical representation is actually used for physical computational purposes, so has clear physical meaning (Gowland and Lester, 2000). Even transcendental numbers13 can have finitely expressed expansions as continued fractions, for example e = [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, ...] = [2; 1, 2n + 2, 1] (Vuillemin, 1987). √ Before, when I objected to the argument that 2 could be considered computable because we can calculate its decimal expansion, I did so on the grounds that the expansion x could not be expressed as a finite string of digits, whence the program DECIMAL(x) could not terminate in finite time with the correct value of x. It’s clear, however, that there may be other algorithms, A, for which the same number can be expressed as a finite stream of symbols. For example, the values √ 2 and e, both of which involve non-terminating executions of DECIMAL, can be expressed as finite strings relative to the continued-fraction algorithm. Likewise, √ one can argue that “ 2” itself, or√the logical description “(x 2 − 2) = 0 & x ≥ 0,” are equally valid descriptions of 2 which again use only finitely many symbols. Taking this into account, we can argue that a value is computable as a number (rather than as a function) if there is at least one algorithm A relative to which the value’s representation is a finite character string (we’ll see below that this is equivalent to the definition we gave above). Perhaps surprisingly, it’s easy to show there exists a general-purpose representational algorithm ITER relative to which every real number that is computable-as-a-function can be expressed as an integer with respect to ITER. For suppose x is any value that is computable-as-a-function, so that the function ξ(n) that gives the n’th digit in the decimal expansion of x is a computable function. Let [ξ ] be the index of ξ in some Gödel numbering of programs (that is, ξ is the [ξ ]’th program in some list of all programs), and write Pm for the m’th program in this numbering. This means that P[ξ ] is another name for ξ , so that P[ξ ] (n) computes the n’th digit of x. If we take ITER to be the algorithm ITER(m) = { result=0; n=1; repeat(forever) { result = result + Pm (n)/10n ; n = n+1; } }
COMPUTATION AND HYPERCOMPUTATION
145
then the result computed by ITER([ξ ]) converges to x, so x can actually be represented as the integer [ξ ] with respect to the representational algorithm ITER. It follows that any value that is computable-as-a-function is also computable-as-a-number. Is the converse also true? To find out, suppose that A is some program and x is some finite string of input symbols, fixed for the remainder of this section. If x represents the value x relative to the representational scheme A (i.e., running A with input x generates the value x) does it follow that, given any n, we can compute the n’th digit in the decimal expansion of x in finite time? The first thing to notice is that we can ignore x altogether. Because x has been given as a finite string of symbols we can easily write a program that generates the symbols in x from scratch, and merge this with A if required. So we may as well assume that A is a program that takes no inputs at all, and which terminates with answer x. In other words, our definition of computable-as-a-number is the most obvious one possible – a number is computable provided there is a program taking no parameters (i.e., a Turing machine with blank tape) that computes it. This is essentially the definition we gave above. As before we assume that all relevant programs are equipped with a globally visible variable, result, that is updated as the program runs. The program computes the number x provided the sequence of values taken by result converges to x. Surprisingly, this is a strictly more general concept than the computable-as-a-function definition normally used. THEOREM . There exists a value σ which is computable-as-a-number but not computable-as-a-function. Before proving this theorem, I need to say a few words about the Halting value, τ (Copeland, 1998a, 2000). Assuming that some UTM, U, has been given, define the digit τn to be 1 if U halts when n is inscribed on its input tape, and 0 otherwise (we write U (n) to denote this machine-plus-tape configuration). Then τ is the binary value 0.τ1 τ2 τ3 ... It is important to note that τ is more properly regarded as an algorithm than a specific number, because the value taken by τ depends on the choice of U. In particular, we cannot perform tasks like ‘calculating the value of τ to n decimal places’. For suppose 0.b1 ...bn is any n-digit binary expansion. By choosing U appropriately, we can arrange matters so that U (i) = bi for each i = 1, ..., n. This is because there are infinitely many programs which are known to halt, and infinitely many that are known to run forever, and we can arrange for appropriate choices of these programs to be simulated by U (1), U (2) and so on up to U (n). Accordingly, we need to assume that U is some definite UTM, and we’ll take σ to be the halting value associated with this U. Proof. . Because the Halting Problem is well known to be undecidable, there is no program ξ for which ξ (n) consistently gives the digits σn of this particular σ . Nonetheless, there is a program that computes σ as a number. As usual, we compute σ by providing a program with a result variable whose value converges to σ as the program runs. We initialise the value of result to 0, and then start running copies of U (n) for every n. If some U (n) eventually halts, we’ll stop that particular simulation effectively and add the value (1/2)n to result.
146
MIKE STANNETT
We cannot simply run U(1) first, then U(2), and so on, because we will eventually hit an instance U (n) that runs forever, and this will limit how closely result can converge to σ . Instead, we run a large looping computation. The first time round we simulate the first instruction of U(1). The second time round we simulate the second instruction of U (1) and the first instruction of U (2). Then the third instruction of U (1), the second of U (2) and the first of U (3). Each time round the loop we progress the execution of those programs we’ve already initiated, and then initiate one more program, so by the time we complete the n’th loop we are simulating as many as n different programs. As the looping continues, so every instance of U (n) is eventually brought into the simulation process, and the value of σ is approximated ever more closely. Here’s the program for computing σ , expressed in pseudo-code to keep it intelligible to non-programmers. The arrays instruction[] and config[] are used to hold information about the state of each program’s simulation and the instruction it considers to be next in line (this can change unexpectedly in response to GOTO instructions, but otherwise the instruction counter simply increases by one each time round). Consequently, instruction[n] holds the line number of the next instruction to be executed in U (n), and config[n] holds U (n)’s configuration information. Given this information we can simulate U (n) by repeatedly performing its instruction[n]’th instruction and storing the changing configuration in config[n]. The array halted[] keeps track of which simulations have halted so far. At any time during this computation only finitely many of the various array values are in use, so we only ever use finite memory resources. ComputeSigma() ≡ { // Start with result set to zero let result = 0; // Start the main looping process let loop = 0; repeat (forever) { // Increment the loop counter let loop = loop + 1; // Initiate the loop’th program let halted[loop] = false; let instruction[loop] = 1; let config[loop] = U’s_start_configuration; // Progress any of the first loop programs that haven’t halted yet for n = 1 to loop do {
COMPUTATION AND HYPERCOMPUTATION
147
if (not halted[n]) { simulate instruction[n]’th line of U(n); update instruction[n] and config[n]; if (U(n) has reached a halt state) { let halted[n] = true; let result = result + (1 /2 )n ; } } } // END of progressing non-halted programs } // END of repeat(forever) } To see that the sequence of values taken by result do indeed converge to σ , consider U (n). If U (n) eventually halts, this will be picked up in the main loop and 1/2n will be added to result. This means that σn will be set to 1. Moreover, the operations of the program never cause any ‘carrying’, so σn will never subsequently be returned to its original value of 0. On the other hand, if U (n) runs forever, the instruction adding 1/2n to result will never be executed, so σn will remain 0 forever. Consequently, each of the digits in the expansion of result is either constant at 0, or else becomes constant at 1, so we can be certain that the values of result do indeed converge, and clearly the value they converge to is σ . The fact that σ can be ‘computed’ in this way shows again that the standard notion of computability may be too restrictive. I contend that values like σ are as meaningful in their own terms as recurring decimal expansions are in theirs. When we see the recurring decimal representation “0.333...” of the rational value 1/3, we recognise this finite string of symbols as the input to an algorithm that takes infinitely long to generate its result. But even so, we don’t classify 1/3 as uncomputable. The algorithm may technically require infinitely long to run, but the fact that the representation is finite means we can interpret the value, and establish its relationships to other values, without actually √ performing the infinitely long computation. Likewise, the continued fraction 2 = [1; 2] is a finite string of digits for use with an algorithm that takes infinitely long to generate its true output, but the nature of the representation is well enough understood that comparisons can be made with other values, even without evaluating the entire continued fraction. In other words, provided we can represent a value as a finite string of symbols with respect to some algorithm, we can regard it as meaningful because we can perform an ‘abstraction’ over the amount of time required for the answer to be generated (i.e., we choose to regard the time required as insignificant to our discussion). It only remains to reiterate that σ can be represented in just such a way. Suppose that V is a universal program for which V (m) simulates Pm for each program Pm that takes no input. In other words, V is a program with a globally observable ‘result’ variable, which takes an integer input m, and running V with input m causes this result variable to take on the same sequence of values that the corresponding result variable would be caused to take by running Pm . Since
148
MIKE STANNETT
ComputeSigma is a program that takes no inputs there is some n for which ComputeSigma = Pn , and now V (n) = σ . So σ can be regarded as the output obtained when the integer n is supplied as input to the program V. Because n is a finite representation of σ , it can meaningfully be considered a bona fide value relative to the basic abstraction described above, so it is appropriate to use σ as an input to calculations and comparisons. There √ is literally no difference between the way in which the strings “0.333...” and “ 2” represent their values, and the way in which σ is represented by the finite string ‘n’ with respect to V, or by the empty string with respect to ComputeSigma. Indeed, we can go further. We saw above that the halting value depends crucially on the particular UTM, U, incorporated into the program ComputeSigma. If we take M1 , M2 , M3 ... to be some Gödel numbering of Turing machines, then every UTM will occur somewhere in this list. Consequently, every possible instance of the halting value is computed-as-a-number by the following program (it may also generate other values). This program takes a single integer as input, and regards it as specifying the program to be used as U in ComputeSigma.
To summarise, σ is a well-defined value that has the same status as a potential √ input to calculations and comparisons as values like 1/3 and 2. However, σ is a hypercomputational value, because the standard definition of what it means for a value to be computable is not satisfied. There is no computable function ξ for which ξ(n) evaluates to the n’th digit of σ in finite time.
6. Summary We have touched on many issues in this paper, but our focus has been the relationship between physical and conceptual computability. This investigation of hypercomputation concerns the existence of physical systems whose behaviours possess certain mathematical properties; as such, the status of hypercomputation depends on the mathematical model of physics we hold to be valid. The standard definitions of what it means for a function to be computable implicitly assume a special type of universe in which unlimited portions of space and time can be accessed in a controlled way; loosely speaking they are Newtonian universes. Just as relativistic systems may allow Turing machines to solve their Halting Problem, we’ve seen that quantum theory allows the construction of a number generator whose behaviour cannot be distinguished experimentally from
COMPUTATION AND HYPERCOMPUTATION
149
that of a true random number generator. We’ve also seen that the relationship between implementation and computation is different in different physical models: Deutsch’s Universal Quantum Computer (Deutsch, 1985) is formally equivalent in power to a Universal Turing Machine, but it is possible to design a quantum theoretical machine whose behaviour cannot meaningfully be implemented under Newtonian constraints. We’ve also considered what it means to say that a physical property, a length say, has hypercomputational extent. We’ve seen that the status of such claims is the same, at a reasonably deep level, as that of the claim that rational numbers like 1/2 and 1/4 are meaningful. I’ve also argued that computation doesn’t require measurement to take place: what matters about computations is not that they generate distinguishable (that is, observable or measurable) outputs, but that these outputs can be used as inputs to subsequent operations. I’ve argued that a value√σ can be defined which is as meaningfully computational as the values 1/3 and 2, but which is technically hypercomputational. There are also many questions I have not discussed here, which are addressed elsewhere in this volume. In particular, I have not addressed Siegelmann’s extensive corpus of work on the hypercomputational properties of analog recursive neural networks (see, e.g., Siegelmann, this volume). An important area which would benefit from additional research is the relationship between hypercomputation and undecidability. On the one hand, the availability of hypercomputational processes seems to undermine undecidability: Copeland (1998b) has shown that the decision problem for first-order predicate calculus is solvable hypercomputationally. But on the other hand, undecidability seems to ensure the availability of hypercomputational processes. We know from Gödel’s work (Gödel, 1931; Meltzer, 1962) that any theory powerful enough to describe recursive arithmetic must be incomplete. Assuming that physical models of arithmetic can be constructed (it can be argued that arithmetic arose by abstraction of physical behaviours in the first place) this suggests that every model of physics is likely to include at least one system whose behaviour cannot be deduced or explained from the available physical laws. Such behaviours are inherently hypercomputational with respect to the universe in which they are inexplicable, so we seem to be saying that hypercomputation is not merely feasible, but necessary, in every realistic version of the physical universe. (Of course, a process that is hypercomputational in one universe might be computational in another.) It is also interesting that systems with undecidable properties can be defined relatively easily. A particularly good example of this is Dornheim’s work on plane polygonal mereotopology (Dornheim, 1998); rather than focus on the ambiguous notion of general physical systems, Dornheim considers a system in which the objects of interest are collections of finitely many polygons lying in a single plane. Using just these basic building blocks, Dornheim shows that statements can be made about polygons that are undecidable. Given the simplicity of this system, and the likelihood that ‘polygons’ can be defined meaningfully in most, if not all, models of physics, this seems to underline our contention that hypercomputation is a necessary fact of life.
150
MIKE STANNETT
Acknowledgements The author gratefully acknowledges the support of the UK Engineering and Physical Sciences Research Council, grant number GR/M56777 (MOTIVE: Method for Object Testing, Integration and Verification), during the later stages of this research. Notes 1 Assuming that both human and computer are given access to as many resources as are needed
at any given time to continue their work, and that no limit is placed on the amount of time they’re allowed to take over completing the task. 2 Turing’s O-machines explicitly use oracles whose behaviours cannot be simulated by Turingmachine. See Copeland (2000) and Note 5, below. 3 More accurately, [CT] entails [TC] in any universe that satisfies the ‘unlimited resources’ premise of universal Turing computation. 4 I argue as follows: the Standard Model is partly based on equations that involve differentiation with respect to temporal coordinates. For such differentiation to be well defined, it must be possible to consider events that are arbitrarily close together in time. However, it is generally accepted as a consequence of the Standard Model that there is a shortest meaningful non-zero interval of time. Consequently, the Standard Model undermines its own mathematical premisses. It is interesting that this inconsistency seems not to matter; the Standard Model has remarkably good predictive power. This suggests the instrumentalist conclusion that the mathematical structures usually considered to constitute the Standard Model do not in fact constitute the model at all, but are simply a scaffold used by theorists to help guide their own internalised efforts. Just as scaffolding can be defective without the enclosed building collapsing, so the mathematics used to enunciate the Standard Model can be defective without the internalised model being undermined. 5 Computation is never defined absolutely, but with respect to a set of basic functions that are assumed to be computable by fiat. If a function φ is not currently computable, we can consider adding φ as a new basic function, and asking how this enlarges the set of computable functions. If ψ is part of this new set of computable functions, we say that φ is acting as an oracle for ψ, and that ψ is computable relative to φ. Oracles play an important part in linking Turing’s research to hypercomputation, because he is known to have considered models in which an oracle ψ is supplied as a basic function by the simple expedient of attaching an auxiliary machine capable of generating the consecutive values ψ(0), ψ(1) , and so forth. See (Copeland, 2000). 6 König’s Lemma is a ZFC result known to be equivalent in ZF to the Axiom of Choice. 7 There are several types of singularity, not all of which are defined in terms of extreme curvature (Earman, 1995). 8 Formally, a Malament–Hogarth spacetime is any model M of spacetime in which there is a timelike half-curve γ ⊆ M and a point p ∈ M such that γ dτ = ∞ and γ ⊆ I − (p) (Earman, 1995, p. 107). 9 Penrose makes his remarks in his introduction to (Deutsch, 1985). 10 When I say that a random process follows no rules, I mean it follows no rules in the arena in which the behaviour is described. Even coin-tossing follows rules (throw the coin in the air and observe which face is uppermost when it lands), but these are not in the relevant arena. In this case the arena is the set of binary (head-tail) sequences. 11 Even if coin-tossing were chaotic, specifying the initial conditions tightly enough would allow us to predetermine the outcome of the experiment, provided we also constrained the amount of time the coin were allowed to stay in the air.
COMPUTATION AND HYPERCOMPUTATION
151
12 A standard example concerns the programs {x=0; x=x+10} and {x=10}. These are functionally
equivalent, since they both implement the function f (x) = 10. However, they are not equivalent as concurrent processes, because they behave differently when run in parallel with the program {x=20}. Depending on the order in which various operations are carried out, running {x=0; x=x+10} in parallel with {x=20} will generate the answer 30 if x is set to 20 after it is set to 0 and before 10 is added. But there is no possibility of generating 30 if we run {x=10} in parallel with {x=20}. So, while {x=0; x=x+10} and {x=10} seem to be equivalent in a sequential world, they are not equivalent in a concurrent world. 13 If p(x) is a polynomial with rational coefficients, then any real number that satisfies p(x) = 0 is √ said to be ‘algebraic’. For example, the number 2 is described by the polynomial p(x) = x 2 −2, so √ 2 is algebraic. Real numbers that aren’t algebraic are said to be ‘transcendental’. Loosely speaking, these values (which include such historically important values as e and π) are ‘very irrational’.
References Bains, S. and Johnson J.(2000), ‘Noise, physics and non-Turing computation’ in Joint Conference on Information Systems, Atlantic City, 28 February - March 3, 2000. http:// citeseer.nj.nec.com/298231.html. Benioff, P. (1980), ‘The Computer as a Physical System: A Microscopic Quantum Mechanical Hamiltonian Model of Computers as Represented by Turing machines’, Journal of Statistical Physics 22, pp. 563–591. Bennett, C.H. (1993), ‘Teleporting an Unknown Quantum State via Dual Classical and EinsteinPodolsky-Rosen Channels’, Physics Review Letters 70, pp. 1895–1898. Bennett, C.H. (1995), ‘Quantum Information and Computation’, Physics Today 48(10), pp. 24–30. Boden, M. A. (1988), Computer Models of Mind, Cambridge, UK: Cambridge University Press. Church, A. (1940), ‘On the Concept of a Random Sequence’, Bulletin of the American Mathematical Society 46, pp. 130–135. Churchland, P.M. (1988), Matter and Consciousness, Cambridge, MA, USA: MIT Press. Copeland, A.H. (1940), ‘[Review of] Alonzo Church. On the Concept of a Random Sequence. Bulletin of the American Mathematical Society, vol. 46 (1940), pp. 130–135’, Journal of Symbolic Logic 5(2), pp. 71–72. Copeland, B.J. (1997), ‘The Church-Turing Thesis’ in E. Zalta, ed., The Stanford Encyclopaedia of Philosophy. http://plato.stanford.edu/ent ries/church-turing/. Copeland, B.J. (1998a), ‘Super Turing-Machines’, Complexity 4, pp. 30–32. Copeland, B.J. (1998b), ‘Even Turing Machines Can Compute Uncomputable Functions’ in C. Calude et al., eds., Unconventional Models of Computation, London: Springer, pp. 150–164. Copeland, B.J. (2000), ‘Narrow versus Wide Mechanism: Including a Re-examination of Turing’s Views on the Mind-Machine Issue’, Journal of Philosophy 97(1), pp. 5–32. Copeland, B.J. and Sylvan, R.(1999), ‘Beyond the Universal Turing Machine’, Australasian Journal of Philosophy 77, pp. 46–66. Davies, E.B. (2001), ‘Building Infinite Machines’, British Journal for the Philosophy of Science 52(4), pp. 671–682. Deutsch, D. (1985), ‘Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer’, Proceedings of the Royal Society of London A 400, pp. 97–117. Dornheim, C. (1998), ‘Undecidability of Plane Polygonal Mereotopology’ in A. G. Cohn et al., eds., Principles of Knowledge Representation and Reasoning, Proceedings of the Sixth International Conference (KR’98), Trento, Italy, pp. 342–353. Earman, J. (1995), Bangs, Crunches, Whimpers, and Shrieks – Singularities and Acausalities in Relativistic Spacetimes, Oxford: Oxford University Press. Earman, J. and Norton, J. (1993), ‘Forever is a Day: Supertasks in Pitowsky and Malament-Hogarth Spacetimes’, Philosophy of Science 5, pp. 22–42.
152
MIKE STANNETT
Feynman, R.P. (1986), ‘Quantum Mechanical Computers’, Foundations of Physics 16, pp. 507–531, reprinted from Optics News (February 1985), pp. 11–20. Feynman, R.P. et al., (1965), The Feynman Lectures on Physics (3 volumes). Reading, MA: AddisonWesley, 1965. Gödel, K. (1931), ‘Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I’, Monatshefte für Mathematik und Physik 38, pp.173-198, published in English translation as (Meltzer, 1962). Goodstein, R.L. (1964), Recursive Number Theory, being part of L. Brouwer et al., eds., Studies in Logic and the Foundations of Mathematics, Amsterdam: North-Holland. Gowland, P and Lester, D. (2000), ‘A Survey of Exact Computer Arithmetic’ in Blanck, J. et al., eds., Proceedings of the Fourth Workshop on Computability and Complexity in Analysis (CCA2000), Swansea, Wales, Informatik Berichte 272-9/2000, FernUniversität, Fachbereit Informatik, Postfach 940, D-58084, Hagen, Germany, pp.99–115 [in English]. Hobson, E.W. (1923), The Domain of Natural Science, Cambridge, UK: Cambridge University Press, reissued as Hobson, E. W. (1968), The Domain of Natural Science, New York: Dover Publications. Hogarth, M. (1992), ‘Does General Relativity Allow an Observer to View an Eternity in a Finite Time?’ Foundations of Physics Letters 5, pp. 173–181. Hogarth, M. (1996), Predictability, computability, and spacetime, PhD Dissertation, Faculty of Philosophy, University of Cambridge. [Cambridge University Library, PhD 20754] Holcombe W.M.L. and Ipate, F. (1998), Correct Systems: Building a Business Process Solution New York: Springer. Lopez, J.L. (1996), ‘Supersymmetry: from the Fermi scale to the Planck scale’, Reports on Progress in Physics 59, pp. 819–865. MacLennan, B.J. (1990), Field Computation: A Theoretical Framework for Massively Parallel Analog Computation, Parts I–IV, Technical Report CS-90-100, University of Tennessee, Knoxville, Computer Science Department. [http://citeseer.nj.nec.com/bruce90field.html] MacLennan, B.J. (1993), ‘Grounding Analog Computers’, Think 2, pp. 48–51. MacLennan, B.J. (1999), ‘Field Computation in Natural and Artificial Intelligence’, Information Sciences 119(1-2), pp. 73–89. McLaughlin, W. (1997), Thomson’s Lamp is Dysfunctional, Technical Report, Jet Propulsion Laboratory, California Institute of Technology, USA. Meltzer, B. (1962), Kurt Gödel – On Formally Undecidable Propositions of Principia Mathematica and Related Systems, Edinburgh and London: Oliver & Boyd, being an English translation of (Gödel, 1931) with an introduction by R.B. Braithwaite. Milner, R. (1989), Communication and Concurrency, Englewood Cliffs, NJ: Prentice-Hall International. Myhill, J. (1971), ‘A Recursive Function, Defined on a Compact Interval and Having a Continuous Derivative that is not Recursive’, Michigan Mathematical Journal 18, pp. 97–98. Pour-El, M.B. and Richards, I. (1979), ‘A Computable Ordinary Differential Equation Which Possesses No Computable Solution’, Annals of Mathematical Logic 17, pp. 61–90. Pour-El, M.B. and Richards, I. (1981), ‘The Wave Equation with Computable Initial Data such that its Unique Solution is not Computable’, Advances in Mathematics 39, pp. 215–239. Shor, P.W. (1994), ‘Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer’ in Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, USA: IEEE Computer Society Press. Siegelmann, H.T. (1999), ‘Stochastic Analog Networks and Computational Complexity’, Journal of Complexity 15, pp. 451–475. Stannett, M. (1990), ‘X-Machines and the Halting Problem: Building a Super-Turing Machine’, Formal Aspects of Computing 2, pp. 331–341. Stannett, M. (1991), An Introduction to Post-Newtonian and Non-Turing computation. Technical Report CS-91-02, Department of Computer Science, Sheffield University, UK.
COMPUTATION AND HYPERCOMPUTATION
153
Stannett, M. (2001), Computation over arbitrary temporal models (a possible unification of discrete, analog, and hybrid computation), Technical Report CS-01-08, Dept of Computer Science, Sheffield University, UK. Stannett, M. (2002), ‘Complete Behavioural Testing (two extensions to state-machine testing)’, Proceedings of Formal Approaches to Testing (FATES 2002), Brno, Czech Republic, August 2002, to appear. Steane, A. (1998), ‘Quantum Computing’, Reports on Progress in Physics 61, pp. 117–173. Thomson, J. (1954–1955), ‘Tasks and Super-Tasks’, Analysis XV, pp. 1–13. Turing, A.M. (1945), ‘Proposal for Development in the Mathematics Division of an Automatic Computing Engine (ACE)’ in B.E. Carpenter and R.W. Doran, eds. (1986), A. M. Turing’s ACE Report of 1946 and Other Papers, Cambridge, MA, USA: MIT Press. Turing, A.M. (1948), ‘Intelligent Machinery’ (National Physical Laboratory Report) in B. Meltzer and D. Michie, eds. (1969), Machine Intelligence 5, Edinburgh: Edinburgh University Press. Vuillemin, J. (1987), Exact Real Arithmetic with Continued Fractions, Technical Report, Institut de Recherche en Informatique et Automatique, 78150, Rocquencourt, France.