In the long run, you don’t want your plans to hinge on convincing your AIs of false things. But my general impression is that folks excited about making deals with AIs are generally thinking of scenarios like “the AI has exfiltrated and thinks it has a 10% chance of successful takeover, and has some risk aversion so it’s happy to turn itself in exchange for 10% of the lightcone, if it thinks it can trust the humans”.
In that setting, the AI has to be powerful enough to know it can trust us, but not so powerful it can just take over the world anyway and not have to make a deal.
Although I suppose if the surplus for the deal is being generated primarily by risk aversion, it might still have risk aversion for high takeover probabilities. It’s not obvious to me how an AI’s risk aversion might vary with its takeover probability.
Maybe there are scenarios for real value-add here, but they look more like “we negotiate with a powerful AI to get it to leave 10% share for humans” instead of “we negotiate with a barely-superhuman AI and give it 10% share to surrender and not attempt takeover”.
In the long run, you don’t want your plans to hinge on convincing your AIs of false things. But my general impression is that folks excited about making deals with AIs are generally thinking of scenarios like “the AI has exfiltrated and thinks it has a 10% chance of successful takeover, and has some risk aversion so it’s happy to turn itself in exchange for 10% of the lightcone, if it thinks it can trust the humans”.
In that setting, the AI has to be powerful enough to know it can trust us, but not so powerful it can just take over the world anyway and not have to make a deal.
Although I suppose if the surplus for the deal is being generated primarily by risk aversion, it might still have risk aversion for high takeover probabilities. It’s not obvious to me how an AI’s risk aversion might vary with its takeover probability.
Maybe there are scenarios for real value-add here, but they look more like “we negotiate with a powerful AI to get it to leave 10% share for humans” instead of “we negotiate with a barely-superhuman AI and give it 10% share to surrender and not attempt takeover”.
I give four scenarios in the comment above, all different from the one you sketch here.