human values are less like a single function defined on an internal world model, and more like a ‘grand bargain’
Pretty much.
among many distinct self-preserving mesa-optimizers.
Well… that’s where things get tricky. The details of brain circuit internal computations and coordination are very complex and counterintuitive. The model I’ve sketched out in my comment is the simplification.
Consider that only a small fraction of the brain’s neurons activate when processing any given input. The specific set of activated neurons and their connections with each other change every time. The brain doesn’t so much select specific, distinct circuits from a toolbox of possible circuits that would be appropriate for the given situation. Instead, the brain dynamically constructs a new configuration of internal circuitry for each input it processes.
In other words, the brain is not a collection of circuits like current deep learning models. It’s more like a probability distribution over possible circuits. To the degree that the brain has internal “agents”, they’re closer to dense regions in that probability distribution than to distinct entities. You can see how rigorous analysis of multiagent dynamics can be tricky when the things doing the negotiating are actually different regions of a probability distribution, each of which is “trying” to ensure the brain continues to sample circuits from said region.
Questions about the intelligence or capabilities of a specific circuit are tricky for a similar reason. The default behavior of shallow brain circuits is to connect with other circuits to form deeper / smarter / more capable circuits. A shallow circuit that has to perform complex world modeling in order to decide on an optimal competitive or cooperative strategy can query deeper circuits that implement strategic planning, similar to how a firm might hire consultants for input on the firm’s current strategy.
The comment above, and my eventual post, both aim to develop mesa optimizing circuits dynamics far enough that some of the key insights fall out, while not running afoul of the full complexity of the situation.
Pretty much.
Well… that’s where things get tricky. The details of brain circuit internal computations and coordination are very complex and counterintuitive. The model I’ve sketched out in my comment is the simplification.
Consider that only a small fraction of the brain’s neurons activate when processing any given input. The specific set of activated neurons and their connections with each other change every time. The brain doesn’t so much select specific, distinct circuits from a toolbox of possible circuits that would be appropriate for the given situation. Instead, the brain dynamically constructs a new configuration of internal circuitry for each input it processes.
In other words, the brain is not a collection of circuits like current deep learning models. It’s more like a probability distribution over possible circuits. To the degree that the brain has internal “agents”, they’re closer to dense regions in that probability distribution than to distinct entities. You can see how rigorous analysis of multiagent dynamics can be tricky when the things doing the negotiating are actually different regions of a probability distribution, each of which is “trying” to ensure the brain continues to sample circuits from said region.
Questions about the intelligence or capabilities of a specific circuit are tricky for a similar reason. The default behavior of shallow brain circuits is to connect with other circuits to form deeper / smarter / more capable circuits. A shallow circuit that has to perform complex world modeling in order to decide on an optimal competitive or cooperative strategy can query deeper circuits that implement strategic planning, similar to how a firm might hire consultants for input on the firm’s current strategy.
The comment above, and my eventual post, both aim to develop mesa optimizing circuits dynamics far enough that some of the key insights fall out, while not running afoul of the full complexity of the situation.