@Épiphanie Gédéon this is great, very complementary/related to what we’ve been developing for the Gaia Network. I’m particularly thrilled to see the focus on simplicity and incrementalism, as well as the willingness to roll up one’s sleeves and write code (often sorely lacking in LW). And I’m glad that you are taking the map/territory problem seriously; I wholeheartedly agree with the following: “Most safe-by-design approaches seem to rely heavily on formal proofs. While formal proofs offer hard guarantees, they are often unreliable because their model of reality needs to be extremely close to reality itself and very detailed to provide assurance.”
A few additional thoughts:
To scale this approach, one will want to have “structural regularizers” towards modularity, interoperability and parsimony. Two of those we have strong opinions on are:
A preference for reusing shared building blocks and building bottom-up. As a decentralized architecture, we implement this preference in terms of credit assignment, specifically free energy flow accounting.
Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs. This enables both a shared statistical notion of model grounding (effectively backing the free energy flow accounting as approximate Bayesian inference of higher-order model structure) and a shared basis for defining and evaluating policy spaces (instantly turning any descriptive model into a usable substrate for model-based RL / active inference).
Learning models from data is super powerful as far as it goes, but it’s sometimes necessary—and often orders of magnitude more efficient—to leverage prior knowledge. Two simple and powerful ways to do it, which we have successfully experimented with, are:
LLM-driven model extraction from scientific literature and other sources of causal knowledge. This is crucial to bootstrap the component library. (See also our friends at system.com.)
Collaborative modeling by LLM-assisted human expert groups. This fits and enhances the “pull request” framework perfectly.
Scaling this to multiple (human or LLM) contributors will require a higher-order model economy of some sort. While one can get away with an implicit, top-down resource economy in the context of a closed contributor group, opening up will require something like a market economy. The free energy flow accounting described above is a suitable primitive for this.
To scale this approach, one will want to have “structural regularizers” towards modularity, interoperability and parsimony
I am unsure of the formal architecture or requirements for these structural regularizers you mention. I agree with using shared building blocks to speed up development and verification. I am unsure credit assignment would work well for this, maybe in the form of “the more a block is used in a code, the more we can trust it”?
Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs.
What do you mean? Why is this specifically needed? Do you mean that if we want to have a go-player, we should have one portion of the code dedicated to assigning probability to what the best move is? Or does it only apply in a different context of finding policies?
Scaling this to multiple (human or LLM) contributors will require a higher-order model economy of some sort
Hmm. Is the argument something like “We want to scale and diversify the agents who will review the code for more robustness (so not just one LLM model for instance), and that means varying level of competence that we will want to figure out and sort”? I had not thought of it that way, I was mainly thinking of just using the same model, and I’m unsure that having weaker code-reviewers will not bring the system down in terms of safety.
Regarding the Gaia Network, the idea seems interesting though I am unclear about the full details yet. I had thought of extending betting markets to a full bayesian network to have a better picture of what everyone believe, and maybe this is related to your idea. In any case, I believe that conveying one’s full model of the world through this kind of network and maybe more may be doable, and quite important to solve some sort of global coordination/truth seeking?
Overall I agree with your idea of a common library and I think there should be some very promising iterations on that. I will contact you more about colaboration ideas!
@Épiphanie Gédéon this is great, very complementary/related to what we’ve been developing for the Gaia Network. I’m particularly thrilled to see the focus on simplicity and incrementalism, as well as the willingness to roll up one’s sleeves and write code (often sorely lacking in LW). And I’m glad that you are taking the map/territory problem seriously; I wholeheartedly agree with the following: “Most safe-by-design approaches seem to rely heavily on formal proofs. While formal proofs offer hard guarantees, they are often unreliable because their model of reality needs to be extremely close to reality itself and very detailed to provide assurance.”
A few additional thoughts:
To scale this approach, one will want to have “structural regularizers” towards modularity, interoperability and parsimony. Two of those we have strong opinions on are:
A preference for reusing shared building blocks and building bottom-up. As a decentralized architecture, we implement this preference in terms of credit assignment, specifically free energy flow accounting.
Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs. This enables both a shared statistical notion of model grounding (effectively backing the free energy flow accounting as approximate Bayesian inference of higher-order model structure) and a shared basis for defining and evaluating policy spaces (instantly turning any descriptive model into a usable substrate for model-based RL / active inference).
Learning models from data is super powerful as far as it goes, but it’s sometimes necessary—and often orders of magnitude more efficient—to leverage prior knowledge. Two simple and powerful ways to do it, which we have successfully experimented with, are:
LLM-driven model extraction from scientific literature and other sources of causal knowledge. This is crucial to bootstrap the component library. (See also our friends at system.com.)
Collaborative modeling by LLM-assisted human expert groups. This fits and enhances the “pull request” framework perfectly.
Scaling this to multiple (human or LLM) contributors will require a higher-order model economy of some sort. While one can get away with an implicit, top-down resource economy in the context of a closed contributor group, opening up will require something like a market economy. The free energy flow accounting described above is a suitable primitive for this.
I’d be keen to find ways to collaborate.
Also @Roman Leventov FYI
Thanks a lot for the kind comment!
I am unsure of the formal architecture or requirements for these structural regularizers you mention. I agree with using shared building blocks to speed up development and verification. I am unsure credit assignment would work well for this, maybe in the form of “the more a block is used in a code, the more we can trust it”?
What do you mean? Why is this specifically needed? Do you mean that if we want to have a go-player, we should have one portion of the code dedicated to assigning probability to what the best move is? Or does it only apply in a different context of finding policies?
Hmm. Is the argument something like “We want to scale and diversify the agents who will review the code for more robustness (so not just one LLM model for instance), and that means varying level of competence that we will want to figure out and sort”? I had not thought of it that way, I was mainly thinking of just using the same model, and I’m unsure that having weaker code-reviewers will not bring the system down in terms of safety.
Regarding the Gaia Network, the idea seems interesting though I am unclear about the full details yet. I had thought of extending betting markets to a full bayesian network to have a better picture of what everyone believe, and maybe this is related to your idea. In any case, I believe that conveying one’s full model of the world through this kind of network and maybe more may be doable, and quite important to solve some sort of global coordination/truth seeking?
Overall I agree with your idea of a common library and I think there should be some very promising iterations on that. I will contact you more about colaboration ideas!