interstice comments on What’s the Relationship Between “Human Values” and the Brain’s Reward System?

interstice 20 Apr 2022 5:20 UTC
4 points
Hmmm....interesting. So in this picture, human values are less like a single function defined on an internal world model, and more like a ‘grand bargain’ among many distinct self-preserving mesa-optimizers. I’ve had vaguely similar thoughts in the past, although the devil is in the details with such proposals(e.g: just how agenty are you imagining these circuits to be? do they actually have the ability to do means-end reasoning about the real world, or have they just stumbled upon heuristics that seem to work well? What kind of learning is applied to them, supervised, unsupervised, reinforcement?) It might be worth trying to make a very simple toy model laying out all the components. I await your future posts with interest.
- Quintin Pope 20 Apr 2022 23:37 UTC
  3 points
  Parent
  human values are less like a single function defined on an internal world model, and more like a ‘grand bargain’
  Pretty much.
  among many distinct self-preserving mesa-optimizers.
  Well… that’s where things get tricky. The details of brain circuit internal computations and coordination are very complex and counterintuitive. The model I’ve sketched out in my comment is the simplification.
  Consider that only a small fraction of the brain’s neurons activate when processing any given input. The specific set of activated neurons and their connections with each other change every time. The brain doesn’t so much select specific, distinct circuits from a toolbox of possible circuits that would be appropriate for the given situation. Instead, the brain dynamically constructs a new configuration of internal circuitry for each input it processes.
  In other words, the brain is not a collection of circuits like current deep learning models. It’s more like a probability distribution over possible circuits. To the degree that the brain has internal “agents”, they’re closer to dense regions in that probability distribution than to distinct entities. You can see how rigorous analysis of multiagent dynamics can be tricky when the things doing the negotiating are actually different regions of a probability distribution, each of which is “trying” to ensure the brain continues to sample circuits from said region.
  Questions about the intelligence or capabilities of a specific circuit are tricky for a similar reason. The default behavior of shallow brain circuits is to connect with other circuits to form deeper / smarter / more capable circuits. A shallow circuit that has to perform complex world modeling in order to decide on an optimal competitive or cooperative strategy can query deeper circuits that implement strategic planning, similar to how a firm might hire consultants for input on the firm’s current strategy.
  The comment above, and my eventual post, both aim to develop mesa optimizing circuits dynamics far enough that some of the key insights fall out, while not running afoul of the full complexity of the situation.