Why is a rock easier to predict than a block of GPUs computing? Because the block of GPUs is optimized so that its end-state depends on a lot of computation. [Maybe by some metric of “good prediction” it wouldn’t be much harder, because “only a few bits change”, but we can easily make it the case that those bits get augmented to affect whatever metric we want.] Since prediction is basically “replicating / approximating in my head the computation made by physics”, it’s to be expected that if there’s more computation that needs to be finely predicted, the task is more difficult. In reality, there is (in the low level of quantum physics) as much total computation going on, but most of it (those lower levels) are screened off enough from macro behavior (in some circumstances) that we can use very accurate heuristics to ignore them, and go “the rock will not move”. This is purposefully subverted in the GPU case: to cram a lot of useful computation into a small amount of space and resources, the micro computations (at the level of circuitry) are orderly secured and augmented, instead of getting screened off due to chaos.
Say we define the Singularity as “when the amount of computation / gram of matter (say, on Earth) exceeds a certain threshold”. What’s so special about this? Well, exactly for the same reason as above, an increase in this amount makes the whole setup harder to predict. Some time before the threshold, maybe we can confidently predict some macro properties of Earth for the next 2 months. Some time after it, maybe we can barely predict that for 1 minute.
But why would we care about this change in speed? After all, for now (against the backdrop of real clock time in physics) it doesn’t really matter whether a change in human history takes 1 year or 1 minute to happen. [In the future maybe it does start mattering because we want to cram in more utopia before heat death, or because of some other weird quirk of physics.] What really matters is how far we can predict “in terms of changes”, not “in terms of absolute time”. Both before and after the Singularity, I might be able to predict what happens to humanity for the next X FLOP (of total cognitive labor employed by all humanity, including non-humans). And that’s really what I care about, if I want to steer the future. The Singularity just makes it so these FLOP happen faster. So why be worried? If I wasn’t worried before about not knowing what happens after X+1 FLOP, and I was content with doing my best at steering given that limited knowledge, why should that change now? [Of course, an option is that you were already worried about X FLOP not being enough, even if the Singularity doesn’t worsen it.]
The obvious reason is changes in differential speed. If I am still a biological human, then it will indeed be a problem that all these FLOP happen faster relative to clock time, since they are also happening faster relative to me, and I will have much less of my own FLOP to predict and control each batch of X FLOP made by humanity-as-a-whole.
In a scenario with uploads, my FLOP will also speed up. But the rest of humanity/machines won’t only speed up, they will also build way more thinking machines. So unless I speed up even more, or my own cognitive machinery also grows at that rate (via tools, or copies of me or enlarging my brain), the ratio of my FLOP to humanity’s FLOP will still decrease.
But there’s conceivable reasons for worry, even if this ratio is held constant:
Maybe prediction becomes differentially harder with scale. That is, maybe using A FLOPs (my cognitive machinery pre-Singularity) to predict X FLOPs (that of humanity pre-Singularity) is easier than using 10A FLOPs (my cognitive machinery post-Singularity) to predict 10X FLOPs (that of humanity post-Singularity). But why? Can’t I just split the 10X in 10 bins, and usea an A to predict each of them as satisfactorily as before? Maybe not, due to the newly complex interconnections between these bins. Of course, such complex interconnections also become positive for my cognitive machinery. But maybe the benefit for prediction from having those interconnections in my machinery is lower than the downgrade from having them in the predicted computation.
[A priori this seems false if we extrapolate from past data, but who knows if this new situation has some important difference.]
Maybe some other properties of the situation (like the higher computation-density in the physical substrate requiring the computations to take on a slightly different, more optimal shape [this seems unlikely]) lead to the predicted computation having some new properties that make it harder to predict. Such properties need not even be something absolute, that “literally makes prediction harder for everyone” (even for intelligences with the right tools/heuristics). It could just be “if I had the right heuristics I might be able to predict this just as well as before (or better), but all my heuristics have been selected for the pre-Singularity computation (which didn’t have this property), and now I don’t know how to proceed”. [I can run again a selection for heuristics (for example running again a copy of me growing up), but that takes a lot more FLOP.]
Another way to think of this is not speed, but granularity—amount of variation in a given 4D bounding box (volume and timeframe). A rock is using no power, is pretty uniform in information, and therefore easy to predict. A microchip is turning electricity into heat and MANY TINY changes of state, which are obviously much more detailed than a rock.
The Singularity
Why is a rock easier to predict than a block of GPUs computing? Because the block of GPUs is optimized so that its end-state depends on a lot of computation.
[Maybe by some metric of “good prediction” it wouldn’t be much harder, because “only a few bits change”, but we can easily make it the case that those bits get augmented to affect whatever metric we want.]
Since prediction is basically “replicating / approximating in my head the computation made by physics”, it’s to be expected that if there’s more computation that needs to be finely predicted, the task is more difficult.
In reality, there is (in the low level of quantum physics) as much total computation going on, but most of it (those lower levels) are screened off enough from macro behavior (in some circumstances) that we can use very accurate heuristics to ignore them, and go “the rock will not move”. This is purposefully subverted in the GPU case: to cram a lot of useful computation into a small amount of space and resources, the micro computations (at the level of circuitry) are orderly secured and augmented, instead of getting screened off due to chaos.
Say we define the Singularity as “when the amount of computation / gram of matter (say, on Earth) exceeds a certain threshold”. What’s so special about this? Well, exactly for the same reason as above, an increase in this amount makes the whole setup harder to predict. Some time before the threshold, maybe we can confidently predict some macro properties of Earth for the next 2 months. Some time after it, maybe we can barely predict that for 1 minute.
But why would we care about this change in speed? After all, for now (against the backdrop of real clock time in physics) it doesn’t really matter whether a change in human history takes 1 year or 1 minute to happen.
[In the future maybe it does start mattering because we want to cram in more utopia before heat death, or because of some other weird quirk of physics.]
What really matters is how far we can predict “in terms of changes”, not “in terms of absolute time”. Both before and after the Singularity, I might be able to predict what happens to humanity for the next X FLOP (of total cognitive labor employed by all humanity, including non-humans). And that’s really what I care about, if I want to steer the future. The Singularity just makes it so these FLOP happen faster. So why be worried? If I wasn’t worried before about not knowing what happens after X+1 FLOP, and I was content with doing my best at steering given that limited knowledge, why should that change now?
[Of course, an option is that you were already worried about X FLOP not being enough, even if the Singularity doesn’t worsen it.]
The obvious reason is changes in differential speed. If I am still a biological human, then it will indeed be a problem that all these FLOP happen faster relative to clock time, since they are also happening faster relative to me, and I will have much less of my own FLOP to predict and control each batch of X FLOP made by humanity-as-a-whole.
In a scenario with uploads, my FLOP will also speed up. But the rest of humanity/machines won’t only speed up, they will also build way more thinking machines. So unless I speed up even more, or my own cognitive machinery also grows at that rate (via tools, or copies of me or enlarging my brain), the ratio of my FLOP to humanity’s FLOP will still decrease.
But there’s conceivable reasons for worry, even if this ratio is held constant:
Maybe prediction becomes differentially harder with scale. That is, maybe using A FLOPs (my cognitive machinery pre-Singularity) to predict X FLOPs (that of humanity pre-Singularity) is easier than using 10A FLOPs (my cognitive machinery post-Singularity) to predict 10X FLOPs (that of humanity post-Singularity). But why? Can’t I just split the 10X in 10 bins, and usea an A to predict each of them as satisfactorily as before? Maybe not, due to the newly complex interconnections between these bins. Of course, such complex interconnections also become positive for my cognitive machinery. But maybe the benefit for prediction from having those interconnections in my machinery is lower than the downgrade from having them in the predicted computation.
[A priori this seems false if we extrapolate from past data, but who knows if this new situation has some important difference.]
Maybe some other properties of the situation (like the higher computation-density in the physical substrate requiring the computations to take on a slightly different, more optimal shape [this seems unlikely]) lead to the predicted computation having some new properties that make it harder to predict. Such properties need not even be something absolute, that “literally makes prediction harder for everyone” (even for intelligences with the right tools/heuristics). It could just be “if I had the right heuristics I might be able to predict this just as well as before (or better), but all my heuristics have been selected for the pre-Singularity computation (which didn’t have this property), and now I don’t know how to proceed”. [I can run again a selection for heuristics (for example running again a copy of me growing up), but that takes a lot more FLOP.]
Another way to think of this is not speed, but granularity—amount of variation in a given 4D bounding box (volume and timeframe). A rock is using no power, is pretty uniform in information, and therefore easy to predict. A microchip is turning electricity into heat and MANY TINY changes of state, which are obviously much more detailed than a rock.