I’m worried about this bot (I’ll call it EsotericBot, EB, since it only Cooperates if you Cooperate but not obviously). If EB cooperates with FB_T (FairBot using theory T), then T must prove that W is consistent: EB(FB_T) implies FB_T(EB), which implies T proves not W proves FB_T(EB), which implies W is consistent. This might just be a detail of your definition that could be fixed with something like what’s done in PrudentBot; it doesn’t look like it at a glance, but I’m not sure. Or you might want to bite that bullet and lose cooperation with these “weak” FairBots, but that seems surprising, and it seems like we’ve just gone too far with defecting against people who obviously cooperate.
>The hypothetical opponent may notice this apparent anomaly and decide investigating it is far more important than the prisoners dilemma.
This doesn’t show up in the formal model. Do you mean, if we’re thinking about AIs in general as they might look if the reason with the same spirit as these modal bots? I’m not sure why this is a constraint you want, it seems very constraining. One thing we could try is to construct plausible counterfactuals by conditioning, not causal-counterfacting, that our opponent is playing against some other bot. In the causal counterfactual the AI would see “please welcome into the Ring… ComplexityBot, fighting FairBot!” and then FairBot goes in, but instead of ComplexBot sees just a DefectBot, and is like “um what?”. Instead if we condition on “FairBot fights DB”, that updates the ring announcer and lots of other stuff, hopefully so that FB doesn’t notice the difference from really fighting DB. This seems vaguely analogous to just asking whether FB(DB); asking FB(DB) doesn’t seem to function in modal bots by in any sense counterfacting the current situation on “I’m actually DB”, it just asks what FB does in this other situation.
>Although I suspect this can be weakened somewhat.
If we’re looking at, say, Boolean combinations of modal statements, this translates to Δ2 I think, so we can just ask for Δ2-soundness.
>proof searches in the same language (say PA) but with S having a much larger max proof length
Interesting.
I’m worried about this bot (I’ll call it EsotericBot, EB, since it only Cooperates if you Cooperate but not obviously). If EB cooperates with FB_T (FairBot using theory T), then T must prove that W is consistent: EB(FB_T) implies FB_T(EB), which implies T proves not W proves FB_T(EB), which implies W is consistent. This might just be a detail of your definition that could be fixed with something like what’s done in PrudentBot; it doesn’t look like it at a glance, but I’m not sure. Or you might want to bite that bullet and lose cooperation with these “weak” FairBots, but that seems surprising, and it seems like we’ve just gone too far with defecting against people who obviously cooperate.
>The hypothetical opponent may notice this apparent anomaly and decide investigating it is far more important than the prisoners dilemma.
This doesn’t show up in the formal model. Do you mean, if we’re thinking about AIs in general as they might look if the reason with the same spirit as these modal bots? I’m not sure why this is a constraint you want, it seems very constraining. One thing we could try is to construct plausible counterfactuals by conditioning, not causal-counterfacting, that our opponent is playing against some other bot. In the causal counterfactual the AI would see “please welcome into the Ring… ComplexityBot, fighting FairBot!” and then FairBot goes in, but instead of ComplexBot sees just a DefectBot, and is like “um what?”. Instead if we condition on “FairBot fights DB”, that updates the ring announcer and lots of other stuff, hopefully so that FB doesn’t notice the difference from really fighting DB. This seems vaguely analogous to just asking whether FB(DB); asking FB(DB) doesn’t seem to function in modal bots by in any sense counterfacting the current situation on “I’m actually DB”, it just asks what FB does in this other situation.
>Although I suspect this can be weakened somewhat.
If we’re looking at, say, Boolean combinations of modal statements, this translates to Δ2 I think, so we can just ask for Δ2-soundness.
>proof searches in the same language (say PA) but with S having a much larger max proof length
This paper is relevant:
Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents by Andrew Critch https://arxiv.org/abs/1602.04184