OK thanks, I guess I missed him differentiating between ‘solve alignment first, then trust’, versus ‘trusting first, given enough intelligence’. Although I think one issue w/having a proof is that we (or a million monkeys, to paraphrase him) still won’t understand the decisions of the AGI...? ie we’ll be asked to trust the prior proof instead of understanding the logic behind each future decision/step which the AGI takes? That also bothers me, because, what are the tokens which comprise a “step”? Does it stop 1,000 times to check with us that we’re comfortable with, or understand, its next move?
However, since, it seems, we can’t explain much of the decisions of our current ANI, how do we expect to understand future ones? He mentions that we may be able to, but only by becoming trans-human.
OK thanks, I guess I missed him differentiating between ‘solve alignment first, then trust’, versus ‘trusting first, given enough intelligence’. Although I think one issue w/having a proof is that we (or a million monkeys, to paraphrase him) still won’t understand the decisions of the AGI...? ie we’ll be asked to trust the prior proof instead of understanding the logic behind each future decision/step which the AGI takes? That also bothers me, because, what are the tokens which comprise a “step”? Does it stop 1,000 times to check with us that we’re comfortable with, or understand, its next move?
However, since, it seems, we can’t explain much of the decisions of our current ANI, how do we expect to understand future ones? He mentions that we may be able to, but only by becoming trans-human.
:)
Exactly what I’m thinking too.