Linda Linsefors comments on The Human’s Hidden Utility Function (Maybe)

Linda Linsefors 25 Aug 2022 11:11 UTC
6 points
If anyone reads this comment…
Do you know if this claims are have held up? Does this post still agree with current neuroscience, or have there been some major updates?
- Gunnar_Zarncke 26 Aug 2022 22:04 UTC
  3 points
  Parent
  I think the three sub-systems can be loosely mapped to the structure discussed in the [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering as follows:
  - the model-based system is the Learning System, except that the Learning System doesn’t calculate value but only learns to model better via reward prediction error.
  - the Pavlovian system is the Steering System and is the only system that provides ground truth “value” (this value is low-level reward; abstract concepts of value are formed by the learning system around this ground truth, but these exist only in so far as they are useful to predict the ground truth).
  - the model-free system doesn’t exist as a separate system but is in the shallower parts of the Learning System. I don’t think it maps to the Thought Assessor but may be wrong.
  In this framework, one could say, as Eliezer suspected, that the value originated outside the model-based system.