Thanks as always to everyone involved in the newsletter!
The model of the first paper sounds great for studying what happens after we’re able to implement corrigibility and impact measures!
You might then reasonably ask what we should be doing instead. I see the goal of AI alignment as figuring out how, given a fuzzy but relatively well-specified task, to build an AI system that is reliably pursuing that task, in the way that we intended it to, but at a capability level beyond that of humans. This does not give you the ability to leave the future in the AI’s hands, but it would defuse the central (to me) argument for AI risk: that an AI system might be adversarially optimizing against you. (Though to be clear, there are still other risks (AN #50) to consider.)
To be more explicit, are the other risks to consider mostly about governance/who gets AGI/regulations? Because it seems that you’re focusing on the technical problem of aligment, which is about doing what we want in a rather narrow sense.
On the model that AI risk is caused by utility maximizers pursuing the wrong reward function, I agree that non-obstruction is a useful goal to aim for, and the resulting approaches (mild optimization, low impact, corrigibility as defined here) make sense to pursue. I do not like this model much (AN #44), but that’s (probably?) a minority view.
It’s weird, my take on your sequence was more that you want to push alternatives to goal-directedness/utility maximization, because maximizing the wrong utility function (or following the wrong goal) is a big AI-risk. Maybe what you mean in the quote above is that your approach focus on not building goal-directed systems, in which case the non-obstruction problem makes less sense?
To be more explicit, are the other risks to consider mostly about governance/who gets AGI/regulations?
Yes.
It’s weird, my take on your sequence was more that you want to push alternatives to goal-directedness/utility maximization, because maximizing the wrong utility function (or following the wrong goal) is a big AI-risk.
Yeah, I don’t think that sequence actually supports my point all that well—I should write more about this in the future. Here I’m claiming that using EU maximization in the real world as the model for “default” AI systems is not a great choice.
Thanks as always to everyone involved in the newsletter!
The model of the first paper sounds great for studying what happens after we’re able to implement corrigibility and impact measures!
To be more explicit, are the other risks to consider mostly about governance/who gets AGI/regulations? Because it seems that you’re focusing on the technical problem of aligment, which is about doing what we want in a rather narrow sense.
It’s weird, my take on your sequence was more that you want to push alternatives to goal-directedness/utility maximization, because maximizing the wrong utility function (or following the wrong goal) is a big AI-risk. Maybe what you mean in the quote above is that your approach focus on not building goal-directed systems, in which case the non-obstruction problem makes less sense?
Yes.
Yeah, I don’t think that sequence actually supports my point all that well—I should write more about this in the future. Here I’m claiming that using EU maximization in the real world as the model for “default” AI systems is not a great choice.