This algorithm asymmetrically “privileges” cooperation over defection
Now that I disagree with! If the algorithm finds a proof that defection is better, it will defect, because the proof system is consistent, so the algorithm can’t in this case find a proof that cooperation is better. (It’s true that it cannot prove that it will defect, of course.) In fact I’d say the algorithm privileges defection, because it’ll defect unless it finds a reason to cooperate.
I don’t see any straight-forward generalization to arbitrary games, but a natural generalization to games where every player has only two choices springs to mind. Take as parameters the payoff matrix and the other players’ source code; then play minimax unless you can prove that playing the other strategy will result in a higher payoff. [Edit: Er, unless the utility you can prove you get under the assumption that you play the other strategy is higher than etc etc… you know what I mean.] Haven’t yet looked into whether this actually does give sensible answers to non-PD games, though. *goes off to do just that*
Now that I disagree with! If the algorithm finds a proof that defection is better, it will defect
What on Earth do you mean by this counterfactual? :-)
The obvious extension of your proposed algorithm to games with many players and many strategies: assume any commonly-accepted method of finding a “fair strategy profile”, attempt Löbian cooperation to reach it, otherwise play minimax. This doesn’t solve fairness/bargaining, and we all are quitefamiliar with this sort of trick already.
Essentially, what you’re doing is “porting” old ideas from one enforcement method (quining) to another (Löbian cooperation), hoping that fairness will magically pop out this time. No. Fairness and enforcement are orthogonal issues. I’m really frustrated that people stubbornly fail to see fairness as the serious obstacle to “solving” game theory. There’s no cheap accidental way around it.
ETA: on rereading, this comment sounded harsh. Please don’t take it personally—I just feel that at this point in the discussion, it’s time to forcefully state some of my background intuitions so other people can break them down more easily.
What on Earth do you mean by this counterfactual? :-)
It’s not a counterfactual, it’s a logical implication, so it’s true! :-) No, seriously, if it wasn’t clear, I mean that against every opponent program for which it can prove a higher utility for defection, it will indeed defect; e.g. against the always-cooperating program and the defection rock.
My intuition is that the algorithm above precisely does not encode a notion of “fair strategy profile,” it just tries to maximize its own utility; I suppose that’s a fuzzy notion, though. I wasn’t arguing that it solves bargaining. An extension to many strategies should cooperate in the finitely iterated PD without wiring in any more notion about “fairness” than in the algorithm above, and without special-casing IPD—if that’s indeed simple I’m curious how it works, it’s not obvious to me.
I mean that against every opponent program for which it can prove a higher utility for defection, it will indeed defect; e.g. against the always-cooperating program and the defection rock.
Ohhh, I managed to miss that. Thank you. This is really good.
About bargaining, I reason like this: if you have a program that’s good enough at maximizing utility that it can cooperate with itself in the PD, and make this program play different games against itself, this will give us a canonical way to calculate a “fair” outcome for every game. But this is very unlikely, so a utility-maximizer that plays arbitrary games well is by implication also very unlikely.
How do your programs play the iterated PD? Do they have memory? I can’t understand what you mean yet.
It seems to me that bargaining will be resolved simultaneously with the problem of deciding under uncertainty (when you can’t hope to find a proof of utility being precisely U).
On the other hand, making games “fuzzy” to solve bargaining has been tried, and it’s not enough.
On the third hand, I feel that some games might be genuinely indeterminate because they abstract too much, they don’t include enough information from the real-world situation—information that in practice ends up determining the outcome. For example, (instantaneous) bargaining in the Rubinstein model depends on the players’ (temporal) discount rates, and if you forgot to look at them, the instantaneous game seems pretty damn indeterminate.
My proposal can’t handle the iterated PD, only games where each player has two choices. I was replying to you saying there was an obvious generalization to more than two strategies—if there is, then we can pass in the payoff matrix of the normal form of a finitely iterated PD (and if it’s a sensible generalization, it should cooperate with itself on every round).
Your argument about bargaining makes sense, though alas I don’t know enough about bargaining to have a really informed opinion. It may be that the idea only works for sufficiently PD-like games, but if we can handle a class of them without special-casing that doesn’t seem so bad. It does at least handle Chicken [Edit: and Stag Hunt] correctly, AFAICS, so it’s not as if it’s PD-only.
Um, the obvious generalization to many strategies must “privilege” one of the strategies apriori, the same way as your algorithm “privileges” cooperation. Otherwise, what single statement would the proof checker be trying to prove? I don’t see a way around that.
Ah, sorry, now I understand what’s going on. You are saying “there’s an obvious generalization, but then you’d have to pick a ‘fair’ strategy profile that it would privilege.” I’m saying “there’s no obvious generalization which preserves what’s interesting about the two-strategy case.” So we’re in agreement already.
(I’m not entirely without hope; I have a vague idea that we could order the possible somehow, and if we can prove a higher utility for strategy X than for any strategy that is below X in the ordering, then the agent can prove it will definitely choose X or a strategy that is above it in the ordering. Or something like that. But need to look at the details much more closely.)
Essentially, what you’re doing is “porting” old ideas from one enforcement method (quining) to another (Löbian cooperation), hoping that fairness will magically pop out this time.
I’m not sure when exactly you added this, is this still what you think I’m doing? What I’m trying to do is to give an algorithm that correctly handles games “where the other agents’ decisions can be viewed as functions of one argument, that argument being your own choice in that particular case” (same Eliezer quote as upthread), without special-casing for the details of particular games like you did with your solution of the PD. I.e., I’m trying to do what I at first thought your original post did, but what you explained to me it doesn’t. AFAICS, if Eliezer’s informal arguments are correct, these games have a clear correct solution without solving bargaining.
I can describe a class of such games and its solution, but I’d hope that if a good decision theoretical agent exists that solves this class, it might also solve some wider class of problems in an intuitively correct way. -- That said, the class is symmetric games, and the solution is the strategy profile that yields the highest utility, among those strategy profiles in which all players choose the same strategy. [More precisely, a good algorithm should choose this algorithm when playing against itself or a similar algorithm, should maximize its utility when playing against an opponent that unconditionally plays some given strategy, and should “do something sensible” in other cases.]
The quining-cooperator algorithm in the Prisoner’s Dilemma forms a “Nash equilibrium in algorithms” against itself. This is a very desirable property to have, which roughly corresponds to “reflective consistency” in single-player decision theory. However, the algorithm you’re trying to develop (which includes maximizing utility against an opponent that plays a strategy unconditionally) will not form a “Nash equilibrium in algorithms” against itself, even in symmetric games.
To see this, consider a simple bargaining game, a variant of “dividing the dollar”. Each player sees the other’s source code and outputs an integer. If the sum of the two integers is less than or equal to 10, both players get the amount in dollars that they asked for. If the sum is greater than 10, both get nothing. Your algorithm will output 5 when playing against itself. But this means it’s not a best reply to itself, because if your opponent is replaced with the simple program “return 9″, you will maximize utility and get only 1.
The quining-cooperator algorithm in the Prisoner’s Dilemma forms a “Nash equilibrium in algorithms” against itself. This is a very desirable property to have, which roughly corresponds to “reflective consistency” in single-player decision theory. However, the algorithm you’re trying to develop (which includes maximizing utility against an opponent that plays a strategy unconditionally) will not form a “Nash equilibrium in algorithms” against itself, even in symmetric games.
On reflection: I agree that “Nash equilibrium in algorithms” would be desirable, but you seem to find it a more knock-out argument than I do. If we’re discussing what kinds of algorithms to enter into something like Axelrod’s tournament, then clearly, Nash equilibrium in algorithms would be a very compelling property. But if world() really is (a utility function over) the whole universe, then it’s less clear to me that “take 1″ is not the right thing to do when encountering a “take 9” rock.
Intuitively, the reason you wouldn’t want to “take 1” in this case is that you would not want someone else—Clippy, say—a motive to leave “take 9″ rocks around for you to find. But there’s the counter-intuition that:
(a) If Clippy does this as a result of reasoning about how you behave, then that’s in fact a different situation (the assumption being that you’re reasoning about the source of the whole world, not just about the source of the “take 9” rock in front of you). What you do in this situation influences whether Clippy will have left behind a “take 9″ rock, so your decision algorithm should not be able to conclude that you maximize utility by “taking 1.” [I’m assuming an UDT-like agent which one can think of as taking the source of the world and returning the actions to perform in different contingencies, so it does not make sense to say that we cannot influence what Clippy does because our sensory input tells us that Clippy has “already” left behind a “take 9” rock.]
(b) If Clippy does not do this as a result of actually reasoning about how you behave, then if you “take 5” (say), Clippy will still have left a “take 9″ rock behind—by assumption, there’s no logical causality from what you do to what Clippy does.
I can’t formalize this argument, and I’m not sure that (b) especially is the right way of thinking about this problem, but this counter-intuition seems at least as compelling to me as the original intuition. Am I missing something?
-- All this said, I no longer think that the idea behind my proposed algorithm is useful. The problem is as far as I can see, if agent 1 runs “do X unless I can prove it’s better to do Y” and agent 2 runs “do A unless I can prove that it’s better to do B,” then the agents won’t play (X,B) or (Y,A) even if this would be the most desirable outcome for both of them—or at least I can’t prove that these outcomes are possible.
“Play minimax unless I can prove the other strategy is better” doesn’t exactly run into this problem in symmetric games as long as the two available strategies have different minimax values, but what I’m really after is a decision theory that would act correctly if it finds such a game embedded in the world, and this does seem to run into the problem described above.
User Perplexed pointed me to Nash’s 1953 paper Two-person cooperative games that seems to give a unique “fair” solution for all two-person competitive games in our setting. I’ve been thinking how to spin the algorithm in the paper into a flavor of utility maximization, but failing so far.
The key is to understand that when two players form a coalition they create a new entity distinct from its members which has its own revealed preferences and its own utility to maximize. That is, you need to step up a level and imagine each member of a coalition as making a deal with the coalition itself, rather than simply dealing with the other members. In joining a coalition, a player gives up some or all of his decision-making power in exchange for a promise of security.
Define coalescence utility for player A as the amount of extra A utility A gets because the pair plays the cooperation game rather than the threat game. If you wish, think of this as a compensation paid to A by the collective in exchange for his cooperation in the objectives of the collective. Define the utility of cooperation to the ‘collective’ (i.e. to the union of A and B considered as an entity at a different level than A and B themselves) as the sum of the logarithms of the coalescence utilities of A and B. Then, the collective maximizes its own utility by maximizing the product of the coalescence utilities (which means the same as maximizing the sum of the logs).
This approach of taking the logarithm of individual utilities to get components of corporate or collective utility extends nicely to coalitions with more than 2 members. For some analogies in other (biological) kinds of optimization processes where taking the logarithm is the trick that makes it all work, see my favorite paper ever or this classic paper from Dick Lewontin:
R. C. Lewontin and D. Cohen.
On population growth in a randomly varying environment.
Proceedings of the National Academy of the Sciences, 62:1056–1060, 1969.
Now that I disagree with! If the algorithm finds a proof that defection is better, it will defect, because the proof system is consistent, so the algorithm can’t in this case find a proof that cooperation is better.
I don’t see what theory do you refer to here, in saying “proof theory is consistent”. We have an algorithm that proves some theorems in PA. The only theory under discussion is PA. If the algorithm finds a proof that defection is better, it defects, period.
If PA is the system the agent uses, then by “the proof system,” I mean PA. I was speaking generically, in case you’d prefer to use ZFC or PA + Con(PA) or whatever.
If the algorithm finds a proof that defection is better, it defects, period.
Er, are you saying that as soon as the algorithm finds such a proof, it stops searching and returns “defect”? That will not work, because then the Löbian reasoning won’t go through, for the reason cousin_it gave. [Because PA cannot prove Con(PA), it also cannot prove that “there’s a short proof that cooperate is better” implies “the agent cooperates”—for all PA knows, PA might be inconsistent, and there might be an even shorter proof that defect is better.]
Or do you mean that the algorithm looks through the whole list of proofs, and if it never finds a proof that cooperate is better, it defects? Then we’re in agreement; I was simply pointing out that in this case, we know that if the algorithm proves “defect is better,” then it will not cooperate, because we know that PA is consistent, so if the algorithm proves “defect is better” then it will not prove “cooperate is better.”
Because PA cannot prove Con(PA), it also cannot prove that “there’s a short proof that cooperate is better” implies “the agent cooperates”
It can enumerate all shorter proofs, and see if any of them imply a different action of the agent. (Note that PA is reflexive, that is can prove that any given finite set of its statements is logically consistent.)
Can you give some link or citation about “reflexive”? I can’t figure out what this means. It seems to me that if PA can prove every finite subset of its axioms consistent, then it can prove itself consistent: “If PA is inconsistent, then there would be a statement A and a proof of A and a proof of NOT A; but each proof only uses a finite set of axioms; the union of two finite sets is finite; so there is a finite set of axioms of PA that proves both A and NOT A; but every finite subset of PA is consistent, contradiction. Ergo, PA is consistent.” Clearly, I’m misunderstanding something.
(PA can’t talk about sets in general, but finite sets are fine, and “finite subset of PA” just means “finite set of propositions, all of which are in the enumeration that is PA.”)
It can’t prove that every finite set of its statements is consistent, but for every finite set of its statements it can prove its consistency. (See this chapter, Corollary 8.)
I don’t understand. The relevant step in the argument, as far as I understand it, requires an inference from “there exists a proof in n steps that cooperate is better” to “the agent cooperates”. It seems to me that “enumerating all shorter proofs” would requiring knowing the precise proof (or at least its length), not just the statement “there exists such a proof”. But I’m probably not following what, exactly, you have in mind; could you expand on your argument?
[I should say I was wrong to say “it … cannot prove that” above—all I can say is that one particular technique for showing that it can prove it doesn’t go through.]
The setting is sufficiently confusing at this point that I give up (I’m not sure what we are talking about). I’ll try to work on a proof (not the proof) of something along the lines of this conjecture.
for all PA knows, PA might be inconsistent, and there might be an even shorter proof that defect is better.
If the agent checked all proofs of length less than X, and none of them prove statement T, there are no proofs of T shorter than X, even if the theory is inconsistent and T is provable (therefore, by longer proofs).
Now that I disagree with! If the algorithm finds a proof that defection is better, it will defect, because the proof system is consistent, so the algorithm can’t in this case find a proof that cooperation is better. (It’s true that it cannot prove that it will defect, of course.) In fact I’d say the algorithm privileges defection, because it’ll defect unless it finds a reason to cooperate.
I don’t see any straight-forward generalization to arbitrary games, but a natural generalization to games where every player has only two choices springs to mind. Take as parameters the payoff matrix and the other players’ source code; then play minimax unless you can prove that playing the other strategy will result in a higher payoff. [Edit: Er, unless the utility you can prove you get under the assumption that you play the other strategy is higher than etc etc… you know what I mean.] Haven’t yet looked into whether this actually does give sensible answers to non-PD games, though. *goes off to do just that*
What on Earth do you mean by this counterfactual? :-)
The obvious extension of your proposed algorithm to games with many players and many strategies: assume any commonly-accepted method of finding a “fair strategy profile”, attempt Löbian cooperation to reach it, otherwise play minimax. This doesn’t solve fairness/bargaining, and we all are quite familiar with this sort of trick already.
Essentially, what you’re doing is “porting” old ideas from one enforcement method (quining) to another (Löbian cooperation), hoping that fairness will magically pop out this time. No. Fairness and enforcement are orthogonal issues. I’m really frustrated that people stubbornly fail to see fairness as the serious obstacle to “solving” game theory. There’s no cheap accidental way around it.
ETA: on rereading, this comment sounded harsh. Please don’t take it personally—I just feel that at this point in the discussion, it’s time to forcefully state some of my background intuitions so other people can break them down more easily.
It’s not a counterfactual, it’s a logical implication, so it’s true! :-) No, seriously, if it wasn’t clear, I mean that against every opponent program for which it can prove a higher utility for defection, it will indeed defect; e.g. against the always-cooperating program and the defection rock.
My intuition is that the algorithm above precisely does not encode a notion of “fair strategy profile,” it just tries to maximize its own utility; I suppose that’s a fuzzy notion, though. I wasn’t arguing that it solves bargaining. An extension to many strategies should cooperate in the finitely iterated PD without wiring in any more notion about “fairness” than in the algorithm above, and without special-casing IPD—if that’s indeed simple I’m curious how it works, it’s not obvious to me.
Ohhh, I managed to miss that. Thank you. This is really good.
About bargaining, I reason like this: if you have a program that’s good enough at maximizing utility that it can cooperate with itself in the PD, and make this program play different games against itself, this will give us a canonical way to calculate a “fair” outcome for every game. But this is very unlikely, so a utility-maximizer that plays arbitrary games well is by implication also very unlikely.
How do your programs play the iterated PD? Do they have memory? I can’t understand what you mean yet.
It seems to me that bargaining will be resolved simultaneously with the problem of deciding under uncertainty (when you can’t hope to find a proof of utility being precisely U).
On one hand, this sounds reasonable apriori.
On the other hand, making games “fuzzy” to solve bargaining has been tried, and it’s not enough.
On the third hand, I feel that some games might be genuinely indeterminate because they abstract too much, they don’t include enough information from the real-world situation—information that in practice ends up determining the outcome. For example, (instantaneous) bargaining in the Rubinstein model depends on the players’ (temporal) discount rates, and if you forgot to look at them, the instantaneous game seems pretty damn indeterminate.
My proposal can’t handle the iterated PD, only games where each player has two choices. I was replying to you saying there was an obvious generalization to more than two strategies—if there is, then we can pass in the payoff matrix of the normal form of a finitely iterated PD (and if it’s a sensible generalization, it should cooperate with itself on every round).
Your argument about bargaining makes sense, though alas I don’t know enough about bargaining to have a really informed opinion. It may be that the idea only works for sufficiently PD-like games, but if we can handle a class of them without special-casing that doesn’t seem so bad. It does at least handle Chicken [Edit: and Stag Hunt] correctly, AFAICS, so it’s not as if it’s PD-only.
Um, the obvious generalization to many strategies must “privilege” one of the strategies apriori, the same way as your algorithm “privileges” cooperation. Otherwise, what single statement would the proof checker be trying to prove? I don’t see a way around that.
Ah, sorry, now I understand what’s going on. You are saying “there’s an obvious generalization, but then you’d have to pick a ‘fair’ strategy profile that it would privilege.” I’m saying “there’s no obvious generalization which preserves what’s interesting about the two-strategy case.” So we’re in agreement already.
(I’m not entirely without hope; I have a vague idea that we could order the possible somehow, and if we can prove a higher utility for strategy X than for any strategy that is below X in the ordering, then the agent can prove it will definitely choose X or a strategy that is above it in the ordering. Or something like that. But need to look at the details much more closely.)
I’m not sure when exactly you added this, is this still what you think I’m doing? What I’m trying to do is to give an algorithm that correctly handles games “where the other agents’ decisions can be viewed as functions of one argument, that argument being your own choice in that particular case” (same Eliezer quote as upthread), without special-casing for the details of particular games like you did with your solution of the PD. I.e., I’m trying to do what I at first thought your original post did, but what you explained to me it doesn’t. AFAICS, if Eliezer’s informal arguments are correct, these games have a clear correct solution without solving bargaining.
[The ETA’s duly noted :-)]
Can you describe this class of games precisely? And can you define the solution precisely, without referring to our algorithms?
I can describe a class of such games and its solution, but I’d hope that if a good decision theoretical agent exists that solves this class, it might also solve some wider class of problems in an intuitively correct way. -- That said, the class is symmetric games, and the solution is the strategy profile that yields the highest utility, among those strategy profiles in which all players choose the same strategy. [More precisely, a good algorithm should choose this algorithm when playing against itself or a similar algorithm, should maximize its utility when playing against an opponent that unconditionally plays some given strategy, and should “do something sensible” in other cases.]
The quining-cooperator algorithm in the Prisoner’s Dilemma forms a “Nash equilibrium in algorithms” against itself. This is a very desirable property to have, which roughly corresponds to “reflective consistency” in single-player decision theory. However, the algorithm you’re trying to develop (which includes maximizing utility against an opponent that plays a strategy unconditionally) will not form a “Nash equilibrium in algorithms” against itself, even in symmetric games.
To see this, consider a simple bargaining game, a variant of “dividing the dollar”. Each player sees the other’s source code and outputs an integer. If the sum of the two integers is less than or equal to 10, both players get the amount in dollars that they asked for. If the sum is greater than 10, both get nothing. Your algorithm will output 5 when playing against itself. But this means it’s not a best reply to itself, because if your opponent is replaced with the simple program “return 9″, you will maximize utility and get only 1.
On reflection: I agree that “Nash equilibrium in algorithms” would be desirable, but you seem to find it a more knock-out argument than I do. If we’re discussing what kinds of algorithms to enter into something like Axelrod’s tournament, then clearly, Nash equilibrium in algorithms would be a very compelling property. But if world() really is (a utility function over) the whole universe, then it’s less clear to me that “take 1″ is not the right thing to do when encountering a “take 9” rock.
Intuitively, the reason you wouldn’t want to “take 1” in this case is that you would not want someone else—Clippy, say—a motive to leave “take 9″ rocks around for you to find. But there’s the counter-intuition that:
(a) If Clippy does this as a result of reasoning about how you behave, then that’s in fact a different situation (the assumption being that you’re reasoning about the source of the whole world, not just about the source of the “take 9” rock in front of you). What you do in this situation influences whether Clippy will have left behind a “take 9″ rock, so your decision algorithm should not be able to conclude that you maximize utility by “taking 1.” [I’m assuming an UDT-like agent which one can think of as taking the source of the world and returning the actions to perform in different contingencies, so it does not make sense to say that we cannot influence what Clippy does because our sensory input tells us that Clippy has “already” left behind a “take 9” rock.]
(b) If Clippy does not do this as a result of actually reasoning about how you behave, then if you “take 5” (say), Clippy will still have left a “take 9″ rock behind—by assumption, there’s no logical causality from what you do to what Clippy does.
I can’t formalize this argument, and I’m not sure that (b) especially is the right way of thinking about this problem, but this counter-intuition seems at least as compelling to me as the original intuition. Am I missing something?
-- All this said, I no longer think that the idea behind my proposed algorithm is useful. The problem is as far as I can see, if agent 1 runs “do X unless I can prove it’s better to do Y” and agent 2 runs “do A unless I can prove that it’s better to do B,” then the agents won’t play (X,B) or (Y,A) even if this would be the most desirable outcome for both of them—or at least I can’t prove that these outcomes are possible.
“Play minimax unless I can prove the other strategy is better” doesn’t exactly run into this problem in symmetric games as long as the two available strategies have different minimax values, but what I’m really after is a decision theory that would act correctly if it finds such a game embedded in the world, and this does seem to run into the problem described above.
Hello again.
User Perplexed pointed me to Nash’s 1953 paper Two-person cooperative games that seems to give a unique “fair” solution for all two-person competitive games in our setting. I’ve been thinking how to spin the algorithm in the paper into a flavor of utility maximization, but failing so far.
The key is to understand that when two players form a coalition they create a new entity distinct from its members which has its own revealed preferences and its own utility to maximize. That is, you need to step up a level and imagine each member of a coalition as making a deal with the coalition itself, rather than simply dealing with the other members. In joining a coalition, a player gives up some or all of his decision-making power in exchange for a promise of security.
Define coalescence utility for player A as the amount of extra A utility A gets because the pair plays the cooperation game rather than the threat game. If you wish, think of this as a compensation paid to A by the collective in exchange for his cooperation in the objectives of the collective. Define the utility of cooperation to the ‘collective’ (i.e. to the union of A and B considered as an entity at a different level than A and B themselves) as the sum of the logarithms of the coalescence utilities of A and B. Then, the collective maximizes its own utility by maximizing the product of the coalescence utilities (which means the same as maximizing the sum of the logs).
This approach of taking the logarithm of individual utilities to get components of corporate or collective utility extends nicely to coalitions with more than 2 members. For some analogies in other (biological) kinds of optimization processes where taking the logarithm is the trick that makes it all work, see my favorite paper ever or this classic paper from Dick Lewontin:
R. C. Lewontin and D. Cohen. On population growth in a randomly varying environment. Proceedings of the National Academy of the Sciences, 62:1056–1060, 1969.
True; thanks for pointing that out.
I don’t see what theory do you refer to here, in saying “proof theory is consistent”. We have an algorithm that proves some theorems in PA. The only theory under discussion is PA. If the algorithm finds a proof that defection is better, it defects, period.
If PA is the system the agent uses, then by “the proof system,” I mean PA. I was speaking generically, in case you’d prefer to use ZFC or PA + Con(PA) or whatever.
Er, are you saying that as soon as the algorithm finds such a proof, it stops searching and returns “defect”? That will not work, because then the Löbian reasoning won’t go through, for the reason cousin_it gave. [Because PA cannot prove Con(PA), it also cannot prove that “there’s a short proof that cooperate is better” implies “the agent cooperates”—for all PA knows, PA might be inconsistent, and there might be an even shorter proof that defect is better.]
Or do you mean that the algorithm looks through the whole list of proofs, and if it never finds a proof that cooperate is better, it defects? Then we’re in agreement; I was simply pointing out that in this case, we know that if the algorithm proves “defect is better,” then it will not cooperate, because we know that PA is consistent, so if the algorithm proves “defect is better” then it will not prove “cooperate is better.”
It can enumerate all shorter proofs, and see if any of them imply a different action of the agent. (Note that PA is reflexive, that is can prove that any given finite set of its statements is logically consistent.)
Can you give some link or citation about “reflexive”? I can’t figure out what this means. It seems to me that if PA can prove every finite subset of its axioms consistent, then it can prove itself consistent: “If PA is inconsistent, then there would be a statement A and a proof of A and a proof of NOT A; but each proof only uses a finite set of axioms; the union of two finite sets is finite; so there is a finite set of axioms of PA that proves both A and NOT A; but every finite subset of PA is consistent, contradiction. Ergo, PA is consistent.” Clearly, I’m misunderstanding something.
(PA can’t talk about sets in general, but finite sets are fine, and “finite subset of PA” just means “finite set of propositions, all of which are in the enumeration that is PA.”)
It can’t prove that every finite set of its statements is consistent, but for every finite set of its statements it can prove its consistency. (See this chapter, Corollary 8.)
I don’t understand. The relevant step in the argument, as far as I understand it, requires an inference from “there exists a proof in n steps that cooperate is better” to “the agent cooperates”. It seems to me that “enumerating all shorter proofs” would requiring knowing the precise proof (or at least its length), not just the statement “there exists such a proof”. But I’m probably not following what, exactly, you have in mind; could you expand on your argument?
[I should say I was wrong to say “it … cannot prove that” above—all I can say is that one particular technique for showing that it can prove it doesn’t go through.]
The setting is sufficiently confusing at this point that I give up (I’m not sure what we are talking about). I’ll try to work on a proof (not the proof) of something along the lines of this conjecture.
If the agent checked all proofs of length less than X, and none of them prove statement T, there are no proofs of T shorter than X, even if the theory is inconsistent and T is provable (therefore, by longer proofs).