Just trying to think this through … at the risk of proving I haven’t carefully read all your posts … :-)
I program my AI to invent a better solar cell. So it starts by reading a materials science textbook. OK, now it knows materials science … it didn’t before … Is that a disallowed AU increase? (As the saying goes, “knowledge is power”...?)
(Definitely a possibility that this is answered later in the sequence)
Rereading the post and thinking about this, I wonder if AUP-based AIs can still do anything (which is what I think Steve was pointing at). Or maybe phrased differently, whether it can still be competitive.
Sure, reading a textbook doesn’t decrease the AU of most other goals, but applying the learned knowledge might. On your paperclip example, I expect that the AUP-based AI will make very few paper clips, or it could have a big impact (after all, we make paperclips in factories, but they change the AUP landscape)
More generally, AUP seems to forbid any kind of competence in a zero-sum-like situation. To go back to Steve’s example, if the AI invents a great new solar cell, then it will make its owner richer and more powerful at the expense of other people, which is forbidden by AUP as I understand it.
Another way to phrase my objection is that at first glance, AUP seems to not only forbid gaining power for the AI, but also gaining power for the AI’s user. Which sounds like a good thing, but might also create incentives to create and use non AUP-based AIs. Does that make any sense, or did I fail to understand some part of the sequence that explains this?
(An interesting consequence of this if I’m right is that AUP-based AIs might be quite competitive for making open-source things, which is pretty cool).
On your paperclip example, I expect that the AUP-based AI will make very few paper clips, or it could have a big impact
How, exactly, would it have a big impact? Do you expect making a few paperclip factories to have a large impact in real life? If not, why would idealized-AUP agents expect that?
I think that for many tasks, idealized-AUP agents would not be competitive. It seems like they’d still be competitive on tasks with more limited scope, like putting apples on plates, construction tasks, or (perhaps) answering questions etc.
which is forbidden by AUP as I understand it.
I’m not sure what your model is here. In this post, this isn’t a constrained optimization problem, but rather a tradeoff between power gain and the main objective. So it’s not like AUP raps the agent’s knuckles and wholly rules out plans involving even a bit of power gain. The agent computes something like (objective score) - c*(power gain), where c is some constant.
On rereading, I guess this post doesn’t make that clear: this post assumes not only that we correctly implement the concepts behind AUP, but also that we slide along the penalty harshness spectrum until we get reasonable plans. It seems like we should hit reasonable plans before power-seeking is allowed, although this is another detail swept under the rug by the idealization.
Another way to phrase my objection is that at first glance, AUP seems to not only forbid gaining power for the AI, but also gaining power for the AI’s user. Which sounds like a good thing, but might also create incentives to create and use non AUP-based AIs. Does that make any sense, or did I fail to understand some part of the sequence that explains this?
Idealized-AUP doesn’t directly penalize gaining power for the user, no. Whether this is indirectly incentivized depends on the idealizations we make.
I think that impact measures levy a steep alignment tax, so yes, I think that there are competitive pressures to cut corners on impact allowances.
Just trying to think this through … at the risk of proving I haven’t carefully read all your posts … :-)
I program my AI to invent a better solar cell. So it starts by reading a materials science textbook. OK, now it knows materials science … it didn’t before … Is that a disallowed AU increase? (As the saying goes, “knowledge is power”...?)
Depends how much power that gains compared to other plans. It prefers plans that don’t gain unnecessary power.
In fact, the “encouraged policy” in the post has the agent reading a Paperclips for Dummies book and making a few extra paperclips.
(Definitely a possibility that this is answered later in the sequence)
Rereading the post and thinking about this, I wonder if AUP-based AIs can still do anything (which is what I think Steve was pointing at). Or maybe phrased differently, whether it can still be competitive.
Sure, reading a textbook doesn’t decrease the AU of most other goals, but applying the learned knowledge might. On your paperclip example, I expect that the AUP-based AI will make very few paper clips, or it could have a big impact (after all, we make paperclips in factories, but they change the AUP landscape)
More generally, AUP seems to forbid any kind of competence in a zero-sum-like situation. To go back to Steve’s example, if the AI invents a great new solar cell, then it will make its owner richer and more powerful at the expense of other people, which is forbidden by AUP as I understand it.
Another way to phrase my objection is that at first glance, AUP seems to not only forbid gaining power for the AI, but also gaining power for the AI’s user. Which sounds like a good thing, but might also create incentives to create and use non AUP-based AIs. Does that make any sense, or did I fail to understand some part of the sequence that explains this?
(An interesting consequence of this if I’m right is that AUP-based AIs might be quite competitive for making open-source things, which is pretty cool).
How, exactly, would it have a big impact? Do you expect making a few paperclip factories to have a large impact in real life? If not, why would idealized-AUP agents expect that?
I think that for many tasks, idealized-AUP agents would not be competitive. It seems like they’d still be competitive on tasks with more limited scope, like putting apples on plates, construction tasks, or (perhaps) answering questions etc.
I’m not sure what your model is here. In this post, this isn’t a constrained optimization problem, but rather a tradeoff between power gain and the main objective. So it’s not like AUP raps the agent’s knuckles and wholly rules out plans involving even a bit of power gain. The agent computes something like (objective score) - c*(power gain), where c is some constant.
On rereading, I guess this post doesn’t make that clear: this post assumes not only that we correctly implement the concepts behind AUP, but also that we slide along the penalty harshness spectrum until we get reasonable plans. It seems like we should hit reasonable plans before power-seeking is allowed, although this is another detail swept under the rug by the idealization.
Idealized-AUP doesn’t directly penalize gaining power for the user, no. Whether this is indirectly incentivized depends on the idealizations we make.
I think that impact measures levy a steep alignment tax, so yes, I think that there are competitive pressures to cut corners on impact allowances.