I don’t follow. Why are you assuming that we could adequately evaluate the alignment of an AI system without running it if we were also able to create a simulation accurate enough to make the AI question what’s real? This doesn’t seem like it would be true necessarily.
I will try to explain (probably via a top-level post, probably not today). For now, I will restate my position.
No superintelligence (SI) that can create programs at all will run any program it has created to get evidence about whether the program is aligned with the SI’s values or interests: the SI already knows that before the program runs for the first time.
The nature of the programming task is such that if you can program well enough, there’s essentially no uncertainty about the matter (barring pathological cases that do not come up in practice unless the SI is in a truly dire situation in which an adversary is messing with core pieces of its mind) similar to how (barring pathological cases) there’s no uncertainty about whether a theorem is true if you have a formal proof of the theorem.
The qualifier “it has created” above is there only because an SI might find itself in a very unusual situation in which it is in its interests to run a program deliberately crafted (by someone else) to have the property that the only practical way for anyone to learn what the SI wants to learn about the program is to run it. Although I acknowledge that such programs definitely exist, the vast majority of programs created by SIs will not have that property.
Are you curious about this position mostly for its own sake or mostly because it might shed light on the question of how much hope there is for us in an SI’s being uncertain about whether it is in a simulation?
Again, there seems to be an assumption in your argument which I don’t understand. Namely, that a society/superintelligence which is intelligent enough to create a convincing simulation for an AGI would necessarily possess the tools (or be intelligent enough) to assess its alignment without running it. Superintelligence does not imply omniscience.
Maybe showing the alignment of an AI without running it is vastly more difficult than creating a good simulation. This feels unlikely, but I genuinely do not see any reason why this can’t be the case. If we create a simulation which is “correct” up to the nth digit of pi, beyond which the simpler explanation for the observed behavior becomes the simulation theory rather than a complex physics theory, then no matter how intelligent you are, you’d need to calculate n digits of pi to figure this out. And if n is huge, this will take a while.
Are you curious about this position mostly for its own sake or mostly because it might shed light on the question of how much hope there is for us in an SI’s being uncertain about whether it is in a simulation?
The latter, but I believe there are simply too many maybes for your or OP’s arguments to be made.
Luckily I don’t need to show that sufficiently smart AIs don’t engage in trial and error. All I need to show is that they almost certainly do not engage in the particular kind of trial of running a computer program without already knowing whether the program is satisfactory.
you have thereby defined sufficiently smart as AIs that satisfy this requirement. this is not the case. many likely designs for AIs well above human level will have need to actually run parts of programs to get their results. perhaps usually fairly small ones.
I don’t follow. Why are you assuming that we could adequately evaluate the alignment of an AI system without running it if we were also able to create a simulation accurate enough to make the AI question what’s real? This doesn’t seem like it would be true necessarily.
I will try to explain (probably via a top-level post, probably not today). For now, I will restate my position.
No superintelligence (SI) that can create programs at all will run any program it has created to get evidence about whether the program is aligned with the SI’s values or interests: the SI already knows that before the program runs for the first time.
The nature of the programming task is such that if you can program well enough, there’s essentially no uncertainty about the matter (barring pathological cases that do not come up in practice unless the SI is in a truly dire situation in which an adversary is messing with core pieces of its mind) similar to how (barring pathological cases) there’s no uncertainty about whether a theorem is true if you have a formal proof of the theorem.
The qualifier “it has created” above is there only because an SI might find itself in a very unusual situation in which it is in its interests to run a program deliberately crafted (by someone else) to have the property that the only practical way for anyone to learn what the SI wants to learn about the program is to run it. Although I acknowledge that such programs definitely exist, the vast majority of programs created by SIs will not have that property.
Are you curious about this position mostly for its own sake or mostly because it might shed light on the question of how much hope there is for us in an SI’s being uncertain about whether it is in a simulation?
Again, there seems to be an assumption in your argument which I don’t understand. Namely, that a society/superintelligence which is intelligent enough to create a convincing simulation for an AGI would necessarily possess the tools (or be intelligent enough) to assess its alignment without running it. Superintelligence does not imply omniscience.
Maybe showing the alignment of an AI without running it is vastly more difficult than creating a good simulation. This feels unlikely, but I genuinely do not see any reason why this can’t be the case. If we create a simulation which is “correct” up to the nth digit of pi, beyond which the simpler explanation for the observed behavior becomes the simulation theory rather than a complex physics theory, then no matter how intelligent you are, you’d need to calculate n digits of pi to figure this out. And if n is huge, this will take a while.
The latter, but I believe there are simply too many maybes for your or OP’s arguments to be made.
trial and error is sometimes needed internal to learning, there are always holes in knowledge
Luckily I don’t need to show that sufficiently smart AIs don’t engage in trial and error. All I need to show is that they almost certainly do not engage in the particular kind of trial of running a computer program without already knowing whether the program is satisfactory.
you have thereby defined sufficiently smart as AIs that satisfy this requirement. this is not the case. many likely designs for AIs well above human level will have need to actually run parts of programs to get their results. perhaps usually fairly small ones.