Thanks a lot for the kind comment!
To scale this approach, one will want to have “structural regularizers” towards modularity, interoperability and parsimony
I am unsure of the formal architecture or requirements for these structural regularizers you mention. I agree with using shared building blocks to speed up development and verification. I am unsure credit assignment would work well for this, maybe in the form of “the more a block is used in a code, the more we can trust it”?
Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs.
What do you mean? Why is this specifically needed? Do you mean that if we want to have a go-player, we should have one portion of the code dedicated to assigning probability to what the best move is? Or does it only apply in a different context of finding policies?
Scaling this to multiple (human or LLM) contributors will require a higher-order model economy of some sort
Hmm. Is the argument something like “We want to scale and diversify the agents who will review the code for more robustness (so not just one LLM model for instance), and that means varying level of competence that we will want to figure out and sort”? I had not thought of it that way, I was mainly thinking of just using the same model, and I’m unsure that having weaker code-reviewers will not bring the system down in terms of safety.
Regarding the Gaia Network, the idea seems interesting though I am unclear about the full details yet. I had thought of extending betting markets to a full bayesian network to have a better picture of what everyone believe, and maybe this is related to your idea. In any case, I believe that conveying one’s full model of the world through this kind of network and maybe more may be doable, and quite important to solve some sort of global coordination/truth seeking?
Overall I agree with your idea of a common library and I think there should be some very promising iterations on that. I will contact you more about colaboration ideas!
So my main reason for worry personally is that there might be an ARA that is deployed with just the goal of “just clone yourself as much as possible” or a goal similar to this. In this case, the AI does not really have to survive particularly among others as long as it is able to pay for itself and continue spreading, or infecting local computers by copying itself to local computers etc… This is a worrisome scenario in that the AI might just be dormant, and already hard to kill. If the AI furthermore has some ability for avoiding detection and adapting/modifying itself, then I really worry that its goals are going to evolve and get selected to converge progressively toward a full takeover (though it may also plateau) and that we will be completely oblivious to it for most of this time, as there won’t really be any incentive to either detect or fight this thoroughly.
Of course, there is also the scenario of an chaos-GPT ARA agent, and this I worry is going to kill many people without it ever truly shutting down, or if we can, it might take a while.
All in all, I think this is more of a question of costs-benefits than if how likely it is. For instance, I think that implementing Know Your Customer policy on all providers right now could be quite feasible and would slow down the initial steps of an ARA agent a lot.
I feel like the main crux of the argument is:
Whether an ARA agent plateaus or goes exponential in terms of abilities and takeover goals.
How much time it will take for an ARA agent that takes over to fully take over after being released.
I am still very unsure about 1, I could imagine many scenarios where the endemic ARA just stagnates and never really transforms into something more.
However, I feel like for 2. you have a model that it is going to take a long time for such an agent to really take over. I am unsure about that, but even if that were the case, my main concern is that once the seed ARA is released (and it might be only very slightly capable of adaptation at first), the n it is going to be extremely difficult to shut it down. If AI labs advance significantly with respect to superintelligence, implementing pause AI might not be too late, but if such an agent has already been released there is not going to be much we can do about it.
I would be very interested to hear more. I didn’t find anything from a quick search on your twitter, do you have a link or a pointer I could read more on for counterarguments about “natural selection favors AIs over humans”?