I think this is an important direction of work because, despite lots of concerns on this forum about the interpretability and explainability of ML, I think that in practice we should expect (in the worlds where we survive, at least) AI agents cooperating within systems (or playing against each other in games, or a mixture of these two modes) are going to be more transparent to each other than humans are transparent to each other.
People always think in private; sometimes give off their “real” thoughts and intentions via facial microexpressions and facial flushing, but less so in the era of remote communication.
AIs always learn by receiving and processing data and doing number crunching on the data, should probably expect that we will build infrastructure for logging this data, looking for signs of deception in the weights or activations selectively saved for future security processing. Moreover, if a single integrated stream of thought (like people have in their heads) proves important for general intelligence, we should expect all these streams of thought to be recorded.
I think it’s also important to transfer insights from mathematical constructs (including UDT, FDT, superrationality, and games with perfect prediction) onto a physical footing. Here, I argued that FDT should be seen as a group (collective) decision theory, i. e., a piece of collective intelligence. In “An active inference model of collective intelligence”, Kaufmann et al. proposed a physics-based explanatory theory of collective intelligence (I don’t have an opinion about this theory, just indicate it as one of the proposals out there).
I think this is an important direction of work because, despite lots of concerns on this forum about the interpretability and explainability of ML, I think that in practice we should expect (in the worlds where we survive, at least) AI agents cooperating within systems (or playing against each other in games, or a mixture of these two modes) are going to be more transparent to each other than humans are transparent to each other.
People always think in private; sometimes give off their “real” thoughts and intentions via facial microexpressions and facial flushing, but less so in the era of remote communication.
AIs always learn by receiving and processing data and doing number crunching on the data, should probably expect that we will build infrastructure for logging this data, looking for signs of deception in the weights or activations selectively saved for future security processing. Moreover, if a single integrated stream of thought (like people have in their heads) proves important for general intelligence, we should expect all these streams of thought to be recorded.
I think it’s also important to transfer insights from mathematical constructs (including UDT, FDT, superrationality, and games with perfect prediction) onto a physical footing. Here, I argued that FDT should be seen as a group (collective) decision theory, i. e., a piece of collective intelligence. In “An active inference model of collective intelligence”, Kaufmann et al. proposed a physics-based explanatory theory of collective intelligence (I don’t have an opinion about this theory, just indicate it as one of the proposals out there).
In such theories and game setups, I think it’s important to consider bounded rationality (see “Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources”, “Information overload for (bounded) rational agents”), communication costs, communication delays and the delays (and costs) of reaching consensus, and the non-zero cost of updating one’s own beliefs (to the point of “Free Will” and the assumption that agents can deflect momentarily, which leads to the idea of pre-commitments in Newcomb problem and Parfit’s hitchhiker, without considering that basically, any belief is a micro pre-commitment to this belief). Also, in iterated games, we should ensure that we don’t model agents making ergodicity assumptions when it’s not “real-world rational”.