Suppose that I convinced you “if you didn’t know much chemistry, you would expect this AI to yield good outcomes.” I think you should be pretty happy. It may be that the AI would predictably cause a chemistry-related disaster in a way that would be obvious to you if you knew chemistry, but overall I think you should expect not to have a safety problem.
This feels like an artifact of a deficient definition, I should never end up with a lemma like “if you didn’t know much chemistry, you’d expect this AI to to yield good outcomes” rather than being able to directly say what we want to say.
That said, I do see some appeal in proving things like “I expect running this AI to be good,” and if we are ever going to prove such statements they are probably going to need to be from some impoverished perspective (since it’s too hard to bring all of the facts about our actual epistemic state into such a proof), so I don’t think it’s totally insane.
If we had a system that is ascription universal from some impoverished perspective, you may or may not be OK. I’m not really worrying about it; I expect this definition to change before the point where I literally end up with a system that is ascription universal from some impoverished perspective, and this definition seems good enough to guide next research steps.
Suppose that I convinced you “if you didn’t know much chemistry, you would expect this AI to yield good outcomes.” I think you should be pretty happy. It may be that the AI would predictably cause a chemistry-related disaster in a way that would be obvious to you if you knew chemistry, but overall I think you should expect not to have a safety problem.
This feels like an artifact of a deficient definition, I should never end up with a lemma like “if you didn’t know much chemistry, you’d expect this AI to to yield good outcomes” rather than being able to directly say what we want to say.
That said, I do see some appeal in proving things like “I expect running this AI to be good,” and if we are ever going to prove such statements they are probably going to need to be from some impoverished perspective (since it’s too hard to bring all of the facts about our actual epistemic state into such a proof), so I don’t think it’s totally insane.
If we had a system that is ascription universal from some impoverished perspective, you may or may not be OK. I’m not really worrying about it; I expect this definition to change before the point where I literally end up with a system that is ascription universal from some impoverished perspective, and this definition seems good enough to guide next research steps.