Ok, I see, so in other words the AGI doesn’t have the ability to write an arbitrary function in the base programming language and call it, it has a fixed code base and has to simulate that function using its existing code. However I think the AGI can still win a race against a straightforward “predict accurately” algorithm, because it can to two things. 1) Include the most important inner loops of the “predict accurately” algorithm as functions in its own code to minimize the relative slowdown (this is not a decision by the AGI but just a matter of which AGI ends up having the highest posterior) and 2) keep finding improvements to its own prediction algorithm so that it can eventually overtake any fixed prediction algorithm in accuracy which hopefully more than “pays for” the remaining slowdown that is incurred.
Let the AGI’s “predict accurately” algorithm be fixed.
What you call a sequence of improvements to the prediction algorithm, let’s just call that the prediction algorithm. Imagine this to have as much or as little overhead as you like compared to what was previously conceptualized as “predict accurately.” I think this reconceptualization eliminates 2) as a concern, and if I’m understanding correctly, 1) is only able to mitigate slowdown, not overpower it.
Also I think 1) doesn’t work—maybe you came to this conclusion as well?
Suppose M is the C programming language, but in C there is no way to say “interpret this string as a C program and run it as fast as a native C program”.
But maybe you’re saying that doesn’t apply because:
(this is not a decision by the AGI but just a matter of which AGI ends up having the highest posterior)
I think this way throws off the contention that this AGI will have a short description length. One can imagine a sliding scale here. Short description, lots of overhead: a simple universe evolves life, aliens decide to run “predict accurately” + “treacherous turn”. Longer description, less overhead: an AGI that runs “predict accurately” + “treacherous turn.” Longer description, less overhead: an AGI with some of the subroutines involved already (conveniently) baked in to its architecture. Once all the subroutines are “baked into its architecture” you just have: the algorithm “predict accurately” + “treacherous turn”. And in this form, that has a longer description than just “predict accurately”.
I’ve made a case that the two endpoints in the trade-off are not problematic. I’ve argued (roughly) that one reduces computational overhead by doing things that dissociate the naturalness of describing “predict accurately” and “treacherous turn” all at once. This goes back to the general principle I proposed above: “The more general a system is, the less well it can do any particular task.” The only thing I feel like I can still do is argue against particular points in the trade-off that you think are likely to cause trouble. Can you point me to an exact inner loop that can be native to an AGI that would cause this to fall outside of this trend? To frame this case, the Turing machine description must specify [AGI + a routine that it can call]--sort of like a brain-computer interface, where the AGI is the brain and the fast routine is the computer.
Ok, I see, so in other words the AGI doesn’t have the ability to write an arbitrary function in the base programming language and call it, it has a fixed code base and has to simulate that function using its existing code. However I think the AGI can still win a race against a straightforward “predict accurately” algorithm, because it can to two things. 1) Include the most important inner loops of the “predict accurately” algorithm as functions in its own code to minimize the relative slowdown (this is not a decision by the AGI but just a matter of which AGI ends up having the highest posterior) and 2) keep finding improvements to its own prediction algorithm so that it can eventually overtake any fixed prediction algorithm in accuracy which hopefully more than “pays for” the remaining slowdown that is incurred.
Let the AGI’s “predict accurately” algorithm be fixed.
What you call a sequence of improvements to the prediction algorithm, let’s just call that the prediction algorithm. Imagine this to have as much or as little overhead as you like compared to what was previously conceptualized as “predict accurately.” I think this reconceptualization eliminates 2) as a concern, and if I’m understanding correctly, 1) is only able to mitigate slowdown, not overpower it.
Also I think 1) doesn’t work—maybe you came to this conclusion as well?
But maybe you’re saying that doesn’t apply because:
I think this way throws off the contention that this AGI will have a short description length. One can imagine a sliding scale here. Short description, lots of overhead: a simple universe evolves life, aliens decide to run “predict accurately” + “treacherous turn”. Longer description, less overhead: an AGI that runs “predict accurately” + “treacherous turn.” Longer description, less overhead: an AGI with some of the subroutines involved already (conveniently) baked in to its architecture. Once all the subroutines are “baked into its architecture” you just have: the algorithm “predict accurately” + “treacherous turn”. And in this form, that has a longer description than just “predict accurately”.
You only have to bake in the innermost part of one loop in order to get almost all the computational savings.
I’ve made a case that the two endpoints in the trade-off are not problematic. I’ve argued (roughly) that one reduces computational overhead by doing things that dissociate the naturalness of describing “predict accurately” and “treacherous turn” all at once. This goes back to the general principle I proposed above: “The more general a system is, the less well it can do any particular task.” The only thing I feel like I can still do is argue against particular points in the trade-off that you think are likely to cause trouble. Can you point me to an exact inner loop that can be native to an AGI that would cause this to fall outside of this trend? To frame this case, the Turing machine description must specify [AGI + a routine that it can call]--sort of like a brain-computer interface, where the AGI is the brain and the fast routine is the computer.
(I actually have a more basic confusion, started a new thread.)