As usual with these things, I don’t really understand the initial assumptions.
Are we assuming that the two AIs will not just engage in war? If agent A managed to hack agent B and replace it with a dumber version, that would help A win all the debates.
Are we assuming that the AIs will not just search for the most efficient way to brainwash the judge? Either drugs, or just words, which this seems to take as a serious possibility.
Are we assuming that the AIs will not try to gather more computational resources in order to outsmart the other agent, or exhibit other instrumentally convergent behaviors?
I’m not saying those assumptions are bad. But I don’t understand when we should and shouldn’t make them.
As usual with these things, I don’t really understand the initial assumptions.
Are we assuming that the two AIs will not just engage in war? If agent A managed to hack agent B and replace it with a dumber version, that would help A win all the debates.
Are we assuming that the AIs will not just search for the most efficient way to brainwash the judge? Either drugs, or just words, which this seems to take as a serious possibility.
Are we assuming that the AIs will not try to gather more computational resources in order to outsmart the other agent, or exhibit other instrumentally convergent behaviors?
I’m not saying those assumptions are bad. But I don’t understand when we should and shouldn’t make them.