Well, not at all for the literal complexity of agents, because we don’t estimate the complexity of our peers. Aristotle thought the heart was the seat of intelligence, Shannon thought AGI could be built in a year, everyone and their mother anthropomorphizes inanimate objects like smoke alarms and printers.
I suspect perceived character traits that engender distrust, the Dark Triad traits, make the trait-possessor seem complex not because their brain must be described in more bits absolutely, but conditionally given the brain of the character judge. That is, we require a larger encoding diff to predict the behavior of people who display unfamiliar desires and intents, or to predict them with comparable accuracy as one does for one’s warm, honest, emotionally stable peer group. For example, someone who appears paranoid is displaying extreme caution in situations the character judge finds forthright and nonthreatening, an extra piece of situational context to the other person’s decision making.
This is a poor explanation overall because we’re much less likely to distrust atypically nice humane people than Machiavellian, sub-psychopath people, even if they’re both less conditionally compressible. It takes a lot of niceness (Stepford Wives) before the uncanny-differential-encoding-valley reaction trips.
Edit: This might have been uncharitable. People who are more prone to lying may be more absolutely complex, because lying skillfully requires keeping track of ones lies and building further lies to support them, while honest beliefs can simply be verified against reality. People who decide by a few fixed, stable criteria (e.g. always voting for the nominated candidate of their political party) might be called trustworthy in the sense of being reliable (if not reliably pro-social). Fulfilling promises and following contracts also make one more trustworthy, in both the weak sense of predictability and the stronger sense of moral behavior. Yudkowsky makes the argument that moral progress tends to produce simplified values.
When people make purchasing decisions, pricing models that are too complex make them less likely to purchase. If it’s too confusing to figure out whether something is a good deal or not, we generally tend to just assume it’s a bad deal. See
http://ideas.repec.org/p/ags/ualbsp/24093.html (Choice Environment, Market Complexity and Consumer Behavior: A Theoretical and Empirical Approach for Incorporating Decision Complexity into Models of Consumer Choice), for example.
For bots searching for formal proofs about your behavior, in order of increasing proof length (in bounded time), being more complex makes it less likely that they will find such a proof. And makes it more likely that they will find it later rather than sooner.
That seems to imply that if the desired behavior is more complex, it is more likely to result in cooperation with you. Why?
Also, such a default strategy would be wide open to exploitation: just make a program that (when played against yours) looks very complex, and you can’t prove anything about it, but that’s all just obfuscation and in the end it reliably defects.
What are the odds that a system that reliably defects will not have a proof that it defects, but that a system that cooperates if it is cooperated with will have no proof.
It’s less a matter of having a proof (or not having one), and more a matter of having a sufficiently short proof (or simple by other criteria) that your adversary will find it before they give up.
In general, I think any algorithm with behavior X and a proof has another algorithm with identical behavior and no proof of it. But here we’re interested in algorithms that are chosen (by their makers) to be deliberately easy to prove things about.
What are the odds that a system that reliably defects will not have a proof that it defects,
It won’t want others to prove it will defect. So if others cooperate when unable to prove either way, then it can always construct itself to make it unproveable that it defects.
that a system that cooperates if it is cooperated with will have no proof.
It will want to have a proof. Here the question is, how good the adversary is at finding proofs in practice. Which is what I asked elsewhere, and I would really like to have an answer.
However, perhaps the system knows its own proof (or its designer knew it). Then it can just write out that proof at the start of its source code where the adversary is sure to see it.
See the agents called “WaitDefectBot” and “WaitFairBot” in the section on optimality in the paper. For any modal agent, there’s a pair that are undecidable in high enough formal systems to be indistinguishable, such that the right move would be cooperation against one and defection against the other. And you can make them both arbitrarily complex if you like.
So no, it would not be a good idea in general to cooperate with agents that are undecidable to you, and it would only incentivize agents to be undecidable toward you (and then defect).
It depends on how they use their intelligence. For instance, consider the variant of FairBot that appears in the last Masquerade post, which oscillates between seeking proofs of its opponent’s cooperation and defection, up to a high threshold.
The original FairBot, or PrudentBot, or Masquerade, all reach mutual cooperation with this variant (call it ToughButFairBot), despite its undecidability in general up to PA+N. That’s because if it finds a proof at a lower level, it acts on that immediately, so you can figure out what it does in those particular cases without climbing too high up the tower of formal systems.
The upshot is that you can have higher levels of strategy when necessary, without sacrificing your ability to provably act on lower levels in other cases. So this principle doesn’t cash out as “defect against anyone smarter than you”, but rather as “defect against anyone who refuses to let you figure out how they’re going to respond to you in particular”.
One downside of having a bot that’s too complicated is that it makes the other bot less likely to trust you.
An interesting question is to what extent a similar phenomenon is present in human relationships.
Well, not at all for the literal complexity of agents, because we don’t estimate the complexity of our peers. Aristotle thought the heart was the seat of intelligence, Shannon thought AGI could be built in a year, everyone and their mother anthropomorphizes inanimate objects like smoke alarms and printers.
I suspect perceived character traits that engender distrust, the Dark Triad traits, make the trait-possessor seem complex not because their brain must be described in more bits absolutely, but conditionally given the brain of the character judge. That is, we require a larger encoding diff to predict the behavior of people who display unfamiliar desires and intents, or to predict them with comparable accuracy as one does for one’s warm, honest, emotionally stable peer group. For example, someone who appears paranoid is displaying extreme caution in situations the character judge finds forthright and nonthreatening, an extra piece of situational context to the other person’s decision making.
This is a poor explanation overall because we’re much less likely to distrust atypically nice humane people than Machiavellian, sub-psychopath people, even if they’re both less conditionally compressible. It takes a lot of niceness (Stepford Wives) before the uncanny-differential-encoding-valley reaction trips.
Edit: This might have been uncharitable. People who are more prone to lying may be more absolutely complex, because lying skillfully requires keeping track of ones lies and building further lies to support them, while honest beliefs can simply be verified against reality. People who decide by a few fixed, stable criteria (e.g. always voting for the nominated candidate of their political party) might be called trustworthy in the sense of being reliable (if not reliably pro-social). Fulfilling promises and following contracts also make one more trustworthy, in both the weak sense of predictability and the stronger sense of moral behavior. Yudkowsky makes the argument that moral progress tends to produce simplified values.
When people make purchasing decisions, pricing models that are too complex make them less likely to purchase. If it’s too confusing to figure out whether something is a good deal or not, we generally tend to just assume it’s a bad deal. See http://ideas.repec.org/p/ags/ualbsp/24093.html (Choice Environment, Market Complexity and Consumer Behavior: A Theoretical and Empirical Approach for Incorporating Decision Complexity into Models of Consumer Choice), for example.
If I recall correctly, there’s a mention in Axelrod’s Evolution of Cooperation of bots which did worse than random because they were so complicated.
That depends on the other bot, doesn’t it?
For bots searching for formal proofs about your behavior, in order of increasing proof length (in bounded time), being more complex makes it less likely that they will find such a proof. And makes it more likely that they will find it later rather than sooner.
Consider the heuristic that more complicated behavior is more likely to be someone trying for complicated behavior.
In some competitions, “cooperate against agents that are very long” might be a better default position than “defect” when the search fails.
That seems to imply that if the desired behavior is more complex, it is more likely to result in cooperation with you. Why?
Also, such a default strategy would be wide open to exploitation: just make a program that (when played against yours) looks very complex, and you can’t prove anything about it, but that’s all just obfuscation and in the end it reliably defects.
What are the odds that a system that reliably defects will not have a proof that it defects, but that a system that cooperates if it is cooperated with will have no proof.
It’s less a matter of having a proof (or not having one), and more a matter of having a sufficiently short proof (or simple by other criteria) that your adversary will find it before they give up.
In general, I think any algorithm with behavior X and a proof has another algorithm with identical behavior and no proof of it. But here we’re interested in algorithms that are chosen (by their makers) to be deliberately easy to prove things about.
It won’t want others to prove it will defect. So if others cooperate when unable to prove either way, then it can always construct itself to make it unproveable that it defects.
It will want to have a proof. Here the question is, how good the adversary is at finding proofs in practice. Which is what I asked elsewhere, and I would really like to have an answer.
However, perhaps the system knows its own proof (or its designer knew it). Then it can just write out that proof at the start of its source code where the adversary is sure to see it.
See the agents called “WaitDefectBot” and “WaitFairBot” in the section on optimality in the paper. For any modal agent, there’s a pair that are undecidable in high enough formal systems to be indistinguishable, such that the right move would be cooperation against one and defection against the other. And you can make them both arbitrarily complex if you like.
So no, it would not be a good idea in general to cooperate with agents that are undecidable to you, and it would only incentivize agents to be undecidable toward you (and then defect).
This does have the problem that in practice it cashes out as defect against anyone who is sufficiently smarter than you.
It depends on how they use their intelligence. For instance, consider the variant of FairBot that appears in the last Masquerade post, which oscillates between seeking proofs of its opponent’s cooperation and defection, up to a high threshold.
The original FairBot, or PrudentBot, or Masquerade, all reach mutual cooperation with this variant (call it ToughButFairBot), despite its undecidability in general up to PA+N. That’s because if it finds a proof at a lower level, it acts on that immediately, so you can figure out what it does in those particular cases without climbing too high up the tower of formal systems.
The upshot is that you can have higher levels of strategy when necessary, without sacrificing your ability to provably act on lower levels in other cases. So this principle doesn’t cash out as “defect against anyone smarter than you”, but rather as “defect against anyone who refuses to let you figure out how they’re going to respond to you in particular”.
In particular I was thinking how the bots discussed here would play against the bots being discussed below the other PD post.