But having strong, precise impacts in open-ended environments is closely related to consequentialism.
But consequentialism only means achieving some kind of goal: it doesn’t have to be a goal you are motivated by. If you are motivated to fulfil goals that are given to you , you can still “use consequentialism”.
Sure, and this point is closely related to a setting I commonly think about for alignment, namely what if we had an ASI which modularly allowed specifying any kind of goal we want. Can we come up with any nontrivial goals that it wouldn’t be a catastrophe to give to it?
As a side-note, this is somewhat complicated by the fact that it matters massively how we define “goal”. Some notions of goals seem to near-provably lead to problems (e.g. an AIXI type situation where the AI is maximizing reward and we have a box outside the AI which presses the reward button in some circumstances—this would almost certainly lead to wireheading no matter what we do), while other notions of goals seem to be trivial (e.g. we could express a goal as a function over its actions, but such a goal would have to contain almost all the intelligence of the AI in order to produce anything useful).
Superintelligences don’t necessarily have goals, and could arrive gradually. A jump to agentive, goal driven ASI is the worst case scenario, but it’s also conjunctive.
It’s not meant as a projection for what is likely to happen, it’s meant as a toy model that makes it easier to think baout what sorts of goals we would like to give our AI.
But consequentialism only means achieving some kind of goal: it doesn’t have to be a goal you are motivated by. If you are motivated to fulfil goals that are given to you , you can still “use consequentialism”.
Sure, and this point is closely related to a setting I commonly think about for alignment, namely what if we had an ASI which modularly allowed specifying any kind of goal we want. Can we come up with any nontrivial goals that it wouldn’t be a catastrophe to give to it?
As a side-note, this is somewhat complicated by the fact that it matters massively how we define “goal”. Some notions of goals seem to near-provably lead to problems (e.g. an AIXI type situation where the AI is maximizing reward and we have a box outside the AI which presses the reward button in some circumstances—this would almost certainly lead to wireheading no matter what we do), while other notions of goals seem to be trivial (e.g. we could express a goal as a function over its actions, but such a goal would have to contain almost all the intelligence of the AI in order to produce anything useful).
We already have some systems with goals. They seem to mostly fail in the direction of wireheading, which is not catastrophic.
Yes but I was talking about artificial superintelligences, not just any system with goals.
Superintelligences don’t necessarily have goals, and could arrive gradually. A jump to agentive, goal driven ASI is the worst case scenario, but it’s also conjunctive.
It’s not meant as a projection for what is likely to happen, it’s meant as a toy model that makes it easier to think baout what sorts of goals we would like to give our AI.
Well, I already answered that question.
Maybe, but then I don’t see your answer.