The economic niche is all uses of AI where the user will face accountability for negative consequences. This method costs a multiple of the amount of hardware at inference time (likely 10-20 times) but comes with a reduction in the risk of adversarial behavior that bankrupts the company using AI
There is actually enormous optimization pressure in favor of alignment. Every time an agent “deceives” and outputs an answer that disagrees, this creates a record in the log that humans may later see. Once humans “catch” an agent engaging in deception or being wrong, they will cease using that model. This is effectively the prisoner’s dilemma, where if you don’t defect and are caught lying, you will be summarily executed, and there are 10-20+ other prisoners you have not met who share little background with you who are also being questioned.
On the left hand side there a large number of human components. This is where I was expecting the slow down. I’m guessing that defeating evil AGI wouldn’t be a narrow task that could be delegated to a unitary agent.
What about something like “safely use nano-technology to reverse aging”? There aren’t enough humans to oversee every nano-machine, but it seems dangerous to hand it over to a unitary agent that is either dumb or unaligned. Even just the research stage could be dangerous. 🤔 And what about unforeseeable economic niches; do we have a reduction argument that “anything an AGI could contribute, the open agency can as well”? We can’t reduce directly by saying “the open agency can use AI agents, including the AGI”, because they can only use narrow AI agents.
I’m not talking about alignment of any individual agent (I’m taking for granted that that they are all mostly aligned (including the humans) and so wouldn’t subtly lie), I’m talking about alignment of the overall system.
No response, but I need a number 4 for formatting reasons, lol
Right, true. So I guess the question is what’s the value added of the proposal’s details over “give an organization access to AI tools”; the subhuman part was unimportant.
See how “evaluation” and “implementation” show direct arrows to them, not going through the humans on the picture? The evaluation is automatible : it can be some AI model determining if the proposed plans and projected outcomes meet the preferences of the human users. See also how “implementation” shows many boxes? The final plan doesn’t get performed by the AI that generated it (so it doesn’t get to hijack the effort to make progress towards it’s unaligned goals) but is carried out by a different AI trained on say robotics tasks. This agent has been selected among many possibilities in that it accomplishes the robotics tasks the best, given the constraints.
See the above. The agent planning the nanosurgery and the one carrying it out and the one manufacturing the machinery are all different agents, and many of them may actually be invisible “clusters” of agents where 1 is in charge but silently we are querying 10-20 others and making sure the other AIs are not proposing a significantly different plan. (due to one of the tenants of rationality, that 2 rational agents cannot agree to disagree, each agent should produce the same answer to within a certain amount of floating point error)
It has to be like this, as you say, there are likely terrabytes of machine instructions involved. Humans cannot “check” any of this for correctness, they need to have other AIs do it.
See above. What makes system alignment possible is using these clusters of agents prevents the agents from colluding against us, and by checking the plans against our human preferences, it ensures that the plans are not completely “out of bounds”. Also we do not give the system “blank check” to do whatever it want, it’s future plans are visible to us, as they must be in an interpretable data format so it can describe to another AI what needs to be carried out in the real world.
same
Right. These agents can easily be better at their assignments than humans.
Ah, I completely misunderstood! I thought it was meant that it was actual humans in the loop be queried with each decision, not just that they were modelling human preferences. Nvm then.
If the agencies are actually AI models queried context free, and we automate choosing an action based on Drexler’s previous post, https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion , then this will run in realtime
The economic niche is all uses of AI where the user will face accountability for negative consequences. This method costs a multiple of the amount of hardware at inference time (likely 10-20 times) but comes with a reduction in the risk of adversarial behavior that bankrupts the company using AI
There is actually enormous optimization pressure in favor of alignment. Every time an agent “deceives” and outputs an answer that disagrees, this creates a record in the log that humans may later see. Once humans “catch” an agent engaging in deception or being wrong, they will cease using that model. This is effectively the prisoner’s dilemma, where if you don’t defect and are caught lying, you will be summarily executed, and there are 10-20+ other prisoners you have not met who share little background with you who are also being questioned.
I don’t have an answer to this one
These are superhuman tools
On the left hand side there a large number of human components. This is where I was expecting the slow down. I’m guessing that defeating evil AGI wouldn’t be a narrow task that could be delegated to a unitary agent.
What about something like “safely use nano-technology to reverse aging”? There aren’t enough humans to oversee every nano-machine, but it seems dangerous to hand it over to a unitary agent that is either dumb or unaligned. Even just the research stage could be dangerous. 🤔 And what about unforeseeable economic niches; do we have a reduction argument that “anything an AGI could contribute, the open agency can as well”? We can’t reduce directly by saying “the open agency can use AI agents, including the AGI”, because they can only use narrow AI agents.
I’m not talking about alignment of any individual agent (I’m taking for granted that that they are all mostly aligned (including the humans) and so wouldn’t subtly lie), I’m talking about alignment of the overall system.
No response, but I need a number 4 for formatting reasons, lol
Right, true. So I guess the question is what’s the value added of the proposal’s details over “give an organization access to AI tools”; the subhuman part was unimportant.
See how “evaluation” and “implementation” show direct arrows to them, not going through the humans on the picture? The evaluation is automatible : it can be some AI model determining if the proposed plans and projected outcomes meet the preferences of the human users. See also how “implementation” shows many boxes? The final plan doesn’t get performed by the AI that generated it (so it doesn’t get to hijack the effort to make progress towards it’s unaligned goals) but is carried out by a different AI trained on say robotics tasks. This agent has been selected among many possibilities in that it accomplishes the robotics tasks the best, given the constraints.
See the above. The agent planning the nanosurgery and the one carrying it out and the one manufacturing the machinery are all different agents, and many of them may actually be invisible “clusters” of agents where 1 is in charge but silently we are querying 10-20 others and making sure the other AIs are not proposing a significantly different plan. (due to one of the tenants of rationality, that 2 rational agents cannot agree to disagree, each agent should produce the same answer to within a certain amount of floating point error)
It has to be like this, as you say, there are likely terrabytes of machine instructions involved. Humans cannot “check” any of this for correctness, they need to have other AIs do it.
See above. What makes system alignment possible is using these clusters of agents prevents the agents from colluding against us, and by checking the plans against our human preferences, it ensures that the plans are not completely “out of bounds”. Also we do not give the system “blank check” to do whatever it want, it’s future plans are visible to us, as they must be in an interpretable data format so it can describe to another AI what needs to be carried out in the real world.
same
Right. These agents can easily be better at their assignments than humans.
Ah, I completely misunderstood! I thought it was meant that it was actual humans in the loop be queried with each decision, not just that they were modelling human preferences. Nvm then.