Optimization power is the source of the danger, not agency. Agents merely wield optimality to achieve their goals.
Agency is orthogonal to optimization power.
Where “agency” is defined as the ability to optimize for an objective, given some internal or external optimization power, and “optimality” (of a system) is defined as having an immense amount of optimization power, either during its creation (the nuclear bomb) or its runtime (Solomonoff induction).
This hints at the notion that there’s a minimum Kolmogorov complexity (aka algorithmic description length) that needs to be met by an objective of an AI to be considered safe, assuming that we want the AI to be safe in the worst case scenario when it has access to extreme optimization power.
I would not generally put the Kolmogorov section the way you did, but I suspect that’s more a disagreement on what Kolmogorov complexity is like than what agents are like. (I think the statement is still literally true.)
Optimization power is the source of the danger, not agency. Agents merely wield optimality to achieve their goals.
Agency is orthogonal to optimization power
@All: It seems we agreethat optimality, when pursued blindly, is about extreme optimization that can lead to dangerous outcomes.
Could it be that we are overlooking the potential for a (superintelligent) system to prioritize what matters more—the effectiveness of a decision—rather than simply optimizing for a single goal? 🤔
For example, optimizing too much for a single goal (getting the most paperclips) might overlook ethical or long-term considerations which may contribute to the greater good for all Beings.
Final question: Under what circumstances might you prefer a (superintelligent) system to reject the paperclip request and suggest alternative solutions, or seek to understand the requester’s underlying needs and motivations?
I would love to hear additional comments or feedback on when to prioritize effectiveness, as I am still trying to understand decision-making better 🤗
Fundamentally, the story was about the failure cases of trying to make capable systems that don’t share your values safe by preventing specific means by which its problem solving capabilities express themselves in scary ways. This is different to what you are getting at here, which is having those systems actually operationally share your values. A well aligned system, in the traditional ‘Friendly AI’ sense of alignment, simply won’t make the choices that the one in the story did.
Elegant. Here’s my summary:
Optimization power is the source of the danger, not agency. Agents merely wield optimality to achieve their goals.
Agency is orthogonal to optimization power.
Where “agency” is defined as the ability to optimize for an objective, given some internal or external optimization power, and “optimality” (of a system) is defined as having an immense amount of optimization power, either during its creation (the nuclear bomb) or its runtime (Solomonoff induction).
This hints at the notion that there’s a minimum Kolmogorov complexity (aka algorithmic description length) that needs to be met by an objective of an AI to be considered safe, assuming that we want the AI to be safe in the worst case scenario when it has access to extreme optimization power.
I’d love to know if I’m missing something.
That seems a reasonable takeaway to me.
I would not generally put the Kolmogorov section the way you did, but I suspect that’s more a disagreement on what Kolmogorov complexity is like than what agents are like. (I think the statement is still literally true.)
Thank you 🙏 @mesaoptimizer for the summary!
@All: It seems we agree that optimality, when pursued blindly, is about extreme optimization that can lead to dangerous outcomes.
Could it be that we are overlooking the potential for a (superintelligent) system to prioritize what matters more—the effectiveness of a decision—rather than simply optimizing for a single goal? 🤔
For example, optimizing too much for a single goal (getting the most paperclips) might overlook ethical or long-term considerations which may contribute to the greater good for all Beings.
Final question:
Under what circumstances might you prefer a (superintelligent) system to reject the paperclip request and suggest alternative solutions, or seek to understand the requester’s underlying needs and motivations?
I would love to hear additional comments or feedback on when to prioritize effectiveness, as I am still trying to understand decision-making better 🤗
Fundamentally, the story was about the failure cases of trying to make capable systems that don’t share your values safe by preventing specific means by which its problem solving capabilities express themselves in scary ways. This is different to what you are getting at here, which is having those systems actually operationally share your values. A well aligned system, in the traditional ‘Friendly AI’ sense of alignment, simply won’t make the choices that the one in the story did.