the gears to ascension comments on Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence

the gears to ascension 5 Dec 2022 19:43 UTC
2 points
−2
A superintelligent program will know that it is being simulated by some other agent
The smallest possible superintelligence is an intelligent system able to make exactly one fewer mistake than I am. So, I will initially constrain to that.
On the one hand, mere superintelligence is insufficient to reliably detect being simulated by another agent. On the other hand, I take you to be saying that you cannot detect all the places the program stores facts which give evidence that the program is in a simulation, and that therefore you cannot condition on those variables being held “not simulated”. eg, one way to end up with this issue is if the program is strong enough to reliably detect the difference between your simulator’s fluid dynamics and reality’s fluid dynamics (this is one of the hardest-to-eliminated differences because of the amount of structure that arises from micro-scale chaos in fluid systems). If you can’t appropriately emulate the true distribution of fluid dynamics, then your superintelligent program ought to be able to find the hyperplane that divides images of simulated fluids from real fluids.
In machine learning you do not get to inspect the model itself to see what kind of cognition it is doing, you only get to inspect the output
This is true of the most popular algorithms today, but I continue to think that pushing the research on formally verified adversarial robustness would mean you can guarantee that the distance between simulated and real data manifolds is less than your certification region. i may post links here later.