Not an actual objection to this proposal, but important note is that we don’t know upper limits of human cognitive capabilities. Like, humans were bad at arithmetics before invention of positional numerical systems and after that they became surprisingly good. We know that somewhere within human abilities is a capability to persuade you to let them out of the box and probably other various forms of “talk-control”. If we could have look in the mind of Einstein while having no clues about classical mechanics, we would have understood nothing. I agree that there is a possibility to not blow up with certainity, but I would like to invent corrigibility measures for CoEms before implementing them.
Relatedly, CoEms could be run at potentially high speed-ups, and many copies or variations could be run together. So we could end up in the classic scenario of a smarter-than-average “civilization”, with “thousands of years” to plan, that wants to break out of the box.
This still seems less existentially risky, though, if we end up in a world where the CoEms retain something approximating human values. They might want to break out of the box, but probably wouldn’t want to commit species-cide on humans.
As fas as I understand, the point on this proposal is that “human-like cognitive architecture ≈ cognitive containability ≈ sort of safety”, not “human-like cognitive architecture ≈ human values”. I just want to say that even human can be cognitively uncontainable relatively to another human, because they can learn mental tricks that look to another human as Magic.
Not an actual objection to this proposal, but important note is that we don’t know upper limits of human cognitive capabilities. Like, humans were bad at arithmetics before invention of positional numerical systems and after that they became surprisingly good. We know that somewhere within human abilities is a capability to persuade you to let them out of the box and probably other various forms of “talk-control”. If we could have look in the mind of Einstein while having no clues about classical mechanics, we would have understood nothing. I agree that there is a possibility to not blow up with certainity, but I would like to invent corrigibility measures for CoEms before implementing them.
Relatedly, CoEms could be run at potentially high speed-ups, and many copies or variations could be run together. So we could end up in the classic scenario of a smarter-than-average “civilization”, with “thousands of years” to plan, that wants to break out of the box.
This still seems less existentially risky, though, if we end up in a world where the CoEms retain something approximating human values. They might want to break out of the box, but probably wouldn’t want to commit species-cide on humans.
As fas as I understand, the point on this proposal is that “human-like cognitive architecture ≈ cognitive containability ≈ sort of safety”, not “human-like cognitive architecture ≈ human values”. I just want to say that even human can be cognitively uncontainable relatively to another human, because they can learn mental tricks that look to another human as Magic.