When you say “optimization target,” it seems like you mean a single point in path-space that the planner aims for, where this point consists of several fixed landmarks along the path which don’t adjust to changing circumstances. Such an optimization target could still have some wiggle room (i.e., consist of an entire distribution of possible sub-paths) between these landmarks, correct? So some level of uncertainty must be built into the plan regardless of whether you call it a prediction or an optimization target.
It seems to me that what you’re advocating for is equivalent to generating an entire ensemble of optimization targets, each based on a different predictive model of how things will go. Then you break those targets up into their constituent landmarks and look for clusters of landmarks in goal-space from across the entire ensemble of paths. Would your “robust bottlenecks” then refer to the densest of these clusters?
Come to think of it, couldn’t this be applied to model corrigibility itself?
Have an AI that’s constantly coming up with predictive models of human preferences, generating an ensemble of plans for satisfying human preferences according to each model. Then break those plans into landmarks and look for clusters in goal-space.
Each cluster could then form a candidate basin of attraction of goals for the AI to pursue. The center of each basin would represent a “robust bottleneck” that would be helpful across predictive models; the breadth of each basin would account for the variance in landmark features; and the depth/attractiveness of each basin would be proportional to the number of predictive models that have landmarks in that cluster.
Ideally, the distribution of these basins would update continuously as each model in the ensemble becomes more predictive of human preferences (both stated and revealed) due to what the AGI learns as it interacts with humans in the real world. Plans should always be open to change in light of new information, including those of an AGI, so the landmarks and their clusters would necessarily shift around as well.
Assuming this is the right approach, the questions that remain would be how to structure those models of human preferences, how to measure their predictive performance, how to update the models on new information, how to use those models to generate plans, how to represent landmarks along plan paths in goal-space, how to convert a vector in goal-space into actionable behavior for the AI to pursue, etc., etc., etc. Okay, yeah, there would still be a lot of work left to do.
When you say “optimization target,” it seems like you mean a single point in path-space that the planner aims for, where this point consists of several fixed landmarks along the path which don’t adjust to changing circumstances. Such an optimization target could still have some wiggle room (i.e., consist of an entire distribution of possible sub-paths) between these landmarks, correct? So some level of uncertainty must be built into the plan regardless of whether you call it a prediction or an optimization target.
It seems to me that what you’re advocating for is equivalent to generating an entire ensemble of optimization targets, each based on a different predictive model of how things will go. Then you break those targets up into their constituent landmarks and look for clusters of landmarks in goal-space from across the entire ensemble of paths. Would your “robust bottlenecks” then refer to the densest of these clusters?
Yup, that’s right.
Come to think of it, couldn’t this be applied to model corrigibility itself?
Have an AI that’s constantly coming up with predictive models of human preferences, generating an ensemble of plans for satisfying human preferences according to each model. Then break those plans into landmarks and look for clusters in goal-space.
Each cluster could then form a candidate basin of attraction of goals for the AI to pursue. The center of each basin would represent a “robust bottleneck” that would be helpful across predictive models; the breadth of each basin would account for the variance in landmark features; and the depth/attractiveness of each basin would be proportional to the number of predictive models that have landmarks in that cluster.
Ideally, the distribution of these basins would update continuously as each model in the ensemble becomes more predictive of human preferences (both stated and revealed) due to what the AGI learns as it interacts with humans in the real world. Plans should always be open to change in light of new information, including those of an AGI, so the landmarks and their clusters would necessarily shift around as well.
Assuming this is the right approach, the questions that remain would be how to structure those models of human preferences, how to measure their predictive performance, how to update the models on new information, how to use those models to generate plans, how to represent landmarks along plan paths in goal-space, how to convert a vector in goal-space into actionable behavior for the AI to pursue, etc., etc., etc. Okay, yeah, there would still be a lot of work left to do.