I don’t think you really understand what slack is. You’re articulating something closer to the idea that the AI needs to be low-impact, but that’s completely different from being unconstrained. People with lots of “slack” in the sociological sense of the term that Zvi describes can still be extremely ambitious and resourceful. They tend to have more room to pursue prosocial ends, but the problem with an AI is that its goals might not be prosocial.
I understand the low impact idea, and it’s a great heuristic, but that’s not quite what I am getting at. The impact may be high, but the space of acceptable outcomes should be broad enough so there is no temptation for the AGI to hide and deceive. A tool becoming an agent and destroying the world because it strives to perform the requested operation is more of a ” keep it low-impact” domain, but to avoid tension with the optimization goal, the binding optimizations constraints should not be tight, which is what slack is. I guess it hints as the issues raised in https://en.wikipedia.org/wiki/Human_Compatible, just not the approach advocated there, “the AI’s true objective remain uncertain, with the AI only approaching certainty about it as it gains more information about humans and the world”.
I don’t think you really understand what slack is. You’re articulating something closer to the idea that the AI needs to be low-impact, but that’s completely different from being unconstrained. People with lots of “slack” in the sociological sense of the term that Zvi describes can still be extremely ambitious and resourceful. They tend to have more room to pursue prosocial ends, but the problem with an AI is that its goals might not be prosocial.
I understand the low impact idea, and it’s a great heuristic, but that’s not quite what I am getting at. The impact may be high, but the space of acceptable outcomes should be broad enough so there is no temptation for the AGI to hide and deceive. A tool becoming an agent and destroying the world because it strives to perform the requested operation is more of a ” keep it low-impact” domain, but to avoid tension with the optimization goal, the binding optimizations constraints should not be tight, which is what slack is. I guess it hints as the issues raised in https://en.wikipedia.org/wiki/Human_Compatible, just not the approach advocated there, “the AI’s true objective remain uncertain, with the AI only approaching certainty about it as it gains more information about humans and the world”.