I may be misreading, but I don’t see how Eliezer’s eval is broken. It can choose between a faithful evaluation and one in which inner calls to eval are replaced with a given value. That’s more powerful than standard.
If you build your own eval, and it returns a different result than the built in eval, you would know you’re being simulated
I may be misreading, but I don’t see how Eliezer’s eval is broken. It can choose between a faithful evaluation and one in which inner calls to eval are replaced with a given value. That’s more powerful than standard.
If you build your own eval, and it returns a different result than the built in eval, you would know you’re being simulated