There’s still the problem of successor agents and self-modifying agents, where you need to set up incentives to create successor agents with the same utility functions and to not strategically self-modify, and I think a solution to that would probably also work as a solution to normal dishonesty.
I do expect that in a case where agents can also see each other’s histories, we can make bargaining go well with the bargaining theory we know (given that the agents try to bargain well, there are of course possible agents which don’t try to cooperate well).
In the cases I’m thinking about you don’t just read their minds now, you read their entire history, including predecessor agents. All is transparent. (Fictional but illustrative example: the French AGI and the Russian AGI are smart like sherlock holmes, they can deduce pretty much everything that happened in great detail leading up to and during the creation of each other + also they are still running on human hardware at human institutions and thanks to constant leaking and also the offence/defense balance favoring offense, they can see logs of what each other is and was thinking the entire time, including through various rounds of modification-to-successor agent.)
There’s still the problem of successor agents and self-modifying agents, where you need to set up incentives to create successor agents with the same utility functions and to not strategically self-modify, and I think a solution to that would probably also work as a solution to normal dishonesty.
I do expect that in a case where agents can also see each other’s histories, we can make bargaining go well with the bargaining theory we know (given that the agents try to bargain well, there are of course possible agents which don’t try to cooperate well).
In the cases I’m thinking about you don’t just read their minds now, you read their entire history, including predecessor agents. All is transparent. (Fictional but illustrative example: the French AGI and the Russian AGI are smart like sherlock holmes, they can deduce pretty much everything that happened in great detail leading up to and during the creation of each other + also they are still running on human hardware at human institutions and thanks to constant leaking and also the offence/defense balance favoring offense, they can see logs of what each other is and was thinking the entire time, including through various rounds of modification-to-successor agent.)