I said I would comment on the rest of the post here, but I’m finding that difficult to do.
The Penalty Functions section is easiest to comment on: the first two paragraphs are a reasonable suggestion (this looks a lot like my suggestion of a cost function, and so I’m predisposed to like it), but I’m stumped by the third paragraph. Are you penalizing the AI for the predictable consequences of it existing, rather than just the actions it takes?
My overall sense is that by trying to describe the universe from the top-down you’re running into insurmountable challenges, because the universe is too much data. I would worry about a system that reliably makes one paperclip whose sensors only include one room first, and then use insights from that solution to attack the global solution.
I’m also not sure the reduced impact intuitions hold for any narrow AIs whose task is to somehow combat existential risk. (Imagine handing over control of some satellites and a few gravitational tethers to a disciple AI to minimize the risk of an asteroid or comet hitting the Earth.) In that case, what we want is for the future Earth-related uncertainty to have the same bumps as current uncertainty, but with different magnitudes- will our metric treat that differently from a future uncertainty which has slightly different bumps?
I’m just saying: here’s a major problem with this approach, let’s put it aside for now.
Are you penalizing the AI for the predictable consequences of it existing, rather than just the actions it takes?
We are penailising the master AI for the predictable consequences of the existence of the particular disciple AI it choose to make.
I’m also not sure the reduced impact intuitions hold for any narrow AIs whose task is to somehow combat existential risk.
No, it doesn’t hold. We could hook it up to something like utility indifference or whatever, but the most likely is that reduced impact AI would be an interim stage on the way to friendly AI.
I said I would comment on the rest of the post here, but I’m finding that difficult to do.
The Penalty Functions section is easiest to comment on: the first two paragraphs are a reasonable suggestion (this looks a lot like my suggestion of a cost function, and so I’m predisposed to like it), but I’m stumped by the third paragraph. Are you penalizing the AI for the predictable consequences of it existing, rather than just the actions it takes?
My overall sense is that by trying to describe the universe from the top-down you’re running into insurmountable challenges, because the universe is too much data. I would worry about a system that reliably makes one paperclip whose sensors only include one room first, and then use insights from that solution to attack the global solution.
I’m also not sure the reduced impact intuitions hold for any narrow AIs whose task is to somehow combat existential risk. (Imagine handing over control of some satellites and a few gravitational tethers to a disciple AI to minimize the risk of an asteroid or comet hitting the Earth.) In that case, what we want is for the future Earth-related uncertainty to have the same bumps as current uncertainty, but with different magnitudes- will our metric treat that differently from a future uncertainty which has slightly different bumps?
I’m just saying: here’s a major problem with this approach, let’s put it aside for now.
We are penailising the master AI for the predictable consequences of the existence of the particular disciple AI it choose to make.
No, it doesn’t hold. We could hook it up to something like utility indifference or whatever, but the most likely is that reduced impact AI would be an interim stage on the way to friendly AI.