How are you imagining making this thing? If you’re training an optimizer that you don’t understand, this can lead to pitfalls / unintended generalization. If it just pops into existence fulfilling the spec by fiat, I think that’s a lot less dangerous than us using ML to make it.
Since the question is about potential dangers, I think it is worth assuming the worst here. Also, realistically, we don’t have a magic want to pop things into existence by fiat so I would guess that by default if such an AI was created it would be created with ML.
So lets say that this is trained largely autonomously with ML. Is there some way that would result in dangers outside the four already-mentioned categories?
Well, you might train an agent that has preferences about solving theorems that it generalizes to preferences about the real world.
Then you’d get something sort of like problem #3, but with way broader possibilities. You don’t have to say “well, it was trained on theorems, so it’s just going to affect what sorts of theorems it gets asked.” It can have preferences about the world in general because it’s getting its preferences by unintended generalization. Information leaks about the world (see acylhalide’s comment) will lead to creative exploitation of hardware, software, and user vulnerabilities.
How are you imagining making this thing? If you’re training an optimizer that you don’t understand, this can lead to pitfalls / unintended generalization. If it just pops into existence fulfilling the spec by fiat, I think that’s a lot less dangerous than us using ML to make it.
Since the question is about potential dangers, I think it is worth assuming the worst here. Also, realistically, we don’t have a magic want to pop things into existence by fiat so I would guess that by default if such an AI was created it would be created with ML.
So lets say that this is trained largely autonomously with ML. Is there some way that would result in dangers outside the four already-mentioned categories?
Well, you might train an agent that has preferences about solving theorems that it generalizes to preferences about the real world.
Then you’d get something sort of like problem #3, but with way broader possibilities. You don’t have to say “well, it was trained on theorems, so it’s just going to affect what sorts of theorems it gets asked.” It can have preferences about the world in general because it’s getting its preferences by unintended generalization. Information leaks about the world (see acylhalide’s comment) will lead to creative exploitation of hardware, software, and user vulnerabilities.