Kaj_Sotala comments on The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala 26 Jan 2020 16:11 UTC
5 points
AF
Great comment, thanks!
Is it really “wrong”? It’s a normative assumption … we get to decide what values we want, right? As “I” am a character, I don’t particularly care what the player wants :-P
Well, to make up a silly example, let’s suppose that you have a conscious belief that you want there to be as much cheesecake as possible. This is because you are feeling generally unsafe, and a part of your brain has associated cheesecakes with a feeling of safety, so it has formed the unconscious prediction that if only there was enough cheesecake, then you would finally feel good and safe.
So you program the AI to extract your character-level values, it correctly notices that you want to have lots of cheesecake, and goes on to fill the world with cheesecake… only for you to realize that now that you have your world full of cheesecake, you still don’t feel as happy as you were on some level expecting to feel, and all of your elaborate rational theories of how cheesecake is the optimal use of atoms start feeling somehow hollow.
- Linda Linsefors 7 Feb 2022 17:06 UTC
  LW: 4 AF: 3
  0
  AF Parent
  There is a missmatch in saying cortex=charcter and subcortex=player.
  If I understand the player-character model right, then uncosuios coping strategies would be player level tactic. But these are learned behaviours, and would therfore be part of cortex.
  In Kaj’s example, the idea that cheescake will make the bad go away exist in the cortex’s world model.
  
  According to Steven’s model of how the brain works (which I think is probably ture), the subcortex is part of the game the player is playing. Specificcally, the subcortex provides the reward signal, and some other importat game stats (stamina level, hit-points, etc). The subcortex is also sort of like a tutorial, drawing your attention to things that the game creator (evoulution) thinks might be usefull, and occational cut scenes (acting out pre-programed behaviour).
  
  ML comparasion:
  * The character is the pre trained nerual net
  * The player is the backprop
  * The cortex is the neural net and backprop
  * Subcortex is the reward signarl and sometimes supervisory signal.
  
  Also, I don’t like the the player-character model much. Like all models it is at best a simplification, and it does catch some of what is going on, but I think it is more wrong than right and I think something like multi-agent model is much better. I.e. there are coping mechanmisms and other less consious strategies living in your brains side by side with who you think you are. But I don’t think these are compleetly invissible the way the player is invissible to the character. They are predictive models (e.g. “cheescake will make me safe”), and it is possible to query them for predictions. And almost all of these models are in the cortex.