Does the one-shot AI necessarily aim to maximize some function (like the probability of saving the world, or the expected “savedness” of the world or whatever), or can we also imagine a satisficing version of the one-shot AI which “just tries to save the world” with a decent probability, and doesn’t aim to do any more, i.e., does not try to maximize that probability or the quality of that saved world etc.?
I’m asking this because
I suspect that we otherwise might still make a mistake in specifying the optimization target and incentivize the one-shot AI to do something that “optimally” saves the world in some way we did not foresee and don’t like.
I try to figure out whether your plan would be hindered by switching from an optimization paradigm to a satisficing paradigm right now in order to buy time for your plan to be put into practice :-)
Does the one-shot AI necessarily aim to maximize some function (like the probability of saving the world, or the expected “savedness” of the world or whatever), or can we also imagine a satisficing version of the one-shot AI which “just tries to save the world” with a decent probability, and doesn’t aim to do any more, i.e., does not try to maximize that probability or the quality of that saved world etc.?
I’m asking this because
I suspect that we otherwise might still make a mistake in specifying the optimization target and incentivize the one-shot AI to do something that “optimally” saves the world in some way we did not foresee and don’t like.
I try to figure out whether your plan would be hindered by switching from an optimization paradigm to a satisficing paradigm right now in order to buy time for your plan to be put into practice :-)