Gunnar_Zarncke comments on The Human’s Hidden Utility Function (Maybe)

Gunnar_Zarncke 26 Aug 2022 22:04 UTC
3 points
I think the three sub-systems can be loosely mapped to the structure discussed in the [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering as follows:
- the model-based system is the Learning System, except that the Learning System doesn’t calculate value but only learns to model better via reward prediction error.
- the Pavlovian system is the Steering System and is the only system that provides ground truth “value” (this value is low-level reward; abstract concepts of value are formed by the learning system around this ground truth, but these exist only in so far as they are useful to predict the ground truth).
- the model-free system doesn’t exist as a separate system but is in the shallower parts of the Learning System. I don’t think it maps to the Thought Assessor but may be wrong.
In this framework, one could say, as Eliezer suspected, that the value originated outside the model-based system.