Danaher’s argument seems flawed. I think he implicitly assumes that each safety test provides independent information on whether the AI is safe or not. In fact, it only tests that the AI answers safety test questions correctly (which could include inspection of internal state). This could either occur if the AI is genuinely safe, or if the AI is unsafe but is mimicking safeness. While safety test data should increase your confidence of an AI’s safety, this will be bounded above by 1 - [probability that the AI is not safe but is mimicking safety].
It seems like Bostrom’s position is that mimicry probability is high. Then you need other kinds of evidence regarding AI safety, such as proofs about the AI architecture that are independent of any results you get from running the AI.
I’d consider putting in hypothesis space that Bostrom declares a mimicry probability being high in order to shift opinions towards the safer direction, not necessarily because it coincides with his epistemic subjective state.
Danaher’s argument seems flawed. I think he implicitly assumes that each safety test provides independent information on whether the AI is safe or not. In fact, it only tests that the AI answers safety test questions correctly (which could include inspection of internal state). This could either occur if the AI is genuinely safe, or if the AI is unsafe but is mimicking safeness. While safety test data should increase your confidence of an AI’s safety, this will be bounded above by 1 - [probability that the AI is not safe but is mimicking safety].
It seems like Bostrom’s position is that mimicry probability is high. Then you need other kinds of evidence regarding AI safety, such as proofs about the AI architecture that are independent of any results you get from running the AI.
I’d consider putting in hypothesis space that Bostrom declares a mimicry probability being high in order to shift opinions towards the safer direction, not necessarily because it coincides with his epistemic subjective state.
I don’t understand negative point. Could it be explained please?
Is it because his position equals what he declares and not what he really think?
Or becase Bostroms mimicry (=for some reasons prefere negative, catastrophic and alarmist position) is not suposed?