On the left hand side there a large number of human components. This is where I was expecting the slow down. I’m guessing that defeating evil AGI wouldn’t be a narrow task that could be delegated to a unitary agent.
What about something like “safely use nano-technology to reverse aging”? There aren’t enough humans to oversee every nano-machine, but it seems dangerous to hand it over to a unitary agent that is either dumb or unaligned. Even just the research stage could be dangerous. 🤔 And what about unforeseeable economic niches; do we have a reduction argument that “anything an AGI could contribute, the open agency can as well”? We can’t reduce directly by saying “the open agency can use AI agents, including the AGI”, because they can only use narrow AI agents.
I’m not talking about alignment of any individual agent (I’m taking for granted that that they are all mostly aligned (including the humans) and so wouldn’t subtly lie), I’m talking about alignment of the overall system.
No response, but I need a number 4 for formatting reasons, lol
Right, true. So I guess the question is what’s the value added of the proposal’s details over “give an organization access to AI tools”; the subhuman part was unimportant.
See how “evaluation” and “implementation” show direct arrows to them, not going through the humans on the picture? The evaluation is automatible : it can be some AI model determining if the proposed plans and projected outcomes meet the preferences of the human users. See also how “implementation” shows many boxes? The final plan doesn’t get performed by the AI that generated it (so it doesn’t get to hijack the effort to make progress towards it’s unaligned goals) but is carried out by a different AI trained on say robotics tasks. This agent has been selected among many possibilities in that it accomplishes the robotics tasks the best, given the constraints.
See the above. The agent planning the nanosurgery and the one carrying it out and the one manufacturing the machinery are all different agents, and many of them may actually be invisible “clusters” of agents where 1 is in charge but silently we are querying 10-20 others and making sure the other AIs are not proposing a significantly different plan. (due to one of the tenants of rationality, that 2 rational agents cannot agree to disagree, each agent should produce the same answer to within a certain amount of floating point error)
It has to be like this, as you say, there are likely terrabytes of machine instructions involved. Humans cannot “check” any of this for correctness, they need to have other AIs do it.
See above. What makes system alignment possible is using these clusters of agents prevents the agents from colluding against us, and by checking the plans against our human preferences, it ensures that the plans are not completely “out of bounds”. Also we do not give the system “blank check” to do whatever it want, it’s future plans are visible to us, as they must be in an interpretable data format so it can describe to another AI what needs to be carried out in the real world.
same
Right. These agents can easily be better at their assignments than humans.
Ah, I completely misunderstood! I thought it was meant that it was actual humans in the loop be queried with each decision, not just that they were modelling human preferences. Nvm then.
On the left hand side there a large number of human components. This is where I was expecting the slow down. I’m guessing that defeating evil AGI wouldn’t be a narrow task that could be delegated to a unitary agent.
What about something like “safely use nano-technology to reverse aging”? There aren’t enough humans to oversee every nano-machine, but it seems dangerous to hand it over to a unitary agent that is either dumb or unaligned. Even just the research stage could be dangerous. 🤔 And what about unforeseeable economic niches; do we have a reduction argument that “anything an AGI could contribute, the open agency can as well”? We can’t reduce directly by saying “the open agency can use AI agents, including the AGI”, because they can only use narrow AI agents.
I’m not talking about alignment of any individual agent (I’m taking for granted that that they are all mostly aligned (including the humans) and so wouldn’t subtly lie), I’m talking about alignment of the overall system.
No response, but I need a number 4 for formatting reasons, lol
Right, true. So I guess the question is what’s the value added of the proposal’s details over “give an organization access to AI tools”; the subhuman part was unimportant.
See how “evaluation” and “implementation” show direct arrows to them, not going through the humans on the picture? The evaluation is automatible : it can be some AI model determining if the proposed plans and projected outcomes meet the preferences of the human users. See also how “implementation” shows many boxes? The final plan doesn’t get performed by the AI that generated it (so it doesn’t get to hijack the effort to make progress towards it’s unaligned goals) but is carried out by a different AI trained on say robotics tasks. This agent has been selected among many possibilities in that it accomplishes the robotics tasks the best, given the constraints.
See the above. The agent planning the nanosurgery and the one carrying it out and the one manufacturing the machinery are all different agents, and many of them may actually be invisible “clusters” of agents where 1 is in charge but silently we are querying 10-20 others and making sure the other AIs are not proposing a significantly different plan. (due to one of the tenants of rationality, that 2 rational agents cannot agree to disagree, each agent should produce the same answer to within a certain amount of floating point error)
It has to be like this, as you say, there are likely terrabytes of machine instructions involved. Humans cannot “check” any of this for correctness, they need to have other AIs do it.
See above. What makes system alignment possible is using these clusters of agents prevents the agents from colluding against us, and by checking the plans against our human preferences, it ensures that the plans are not completely “out of bounds”. Also we do not give the system “blank check” to do whatever it want, it’s future plans are visible to us, as they must be in an interpretable data format so it can describe to another AI what needs to be carried out in the real world.
same
Right. These agents can easily be better at their assignments than humans.
Ah, I completely misunderstood! I thought it was meant that it was actual humans in the loop be queried with each decision, not just that they were modelling human preferences. Nvm then.