Steven Byrnes comments on Force neural nets to use models, then detect these

Steven Byrnes Oct 5, 2021, 1:57 PM
LW: 2 AF: 1
AF
This one kinda confuses me. I’m of the opinion that the human brain is “constructed with a model explicitly, so that identifying the model is as simple as saying “the model is in this sub-module, the one labelled ‘model’”.” Of course the contents of the model are learned, but I think the question of whether any particular plastic synapse is or is not part of the information content of the model will have a straightforward yes-or-no answer. If that’s right, then “it’s hard to find the model (if any) in a trained model-free RL agent” is a disanalogy to “AIs learning human values”. It would be more analogous to just train a MuZero clone, which has a labeled “model” component, instead of training a model-free RL.
And then looking at weights and activations would also be disanalogous to “AIs learning human values”, since we probably won’t have those kinds of real-time-brain-scanning technologies, right?
Sorry if I’m misunderstanding.
- Stuart_Armstrong Oct 5, 2021, 2:00 PM
  LW: 5 AF: 4
  AF Parent
  
  I think the question of whether any particular plastic synapse is or is not part of the information content of the model will have a straightforward yes-or-no answer.
  
  I don’t think it has an easy yes or no answer (at least without some thought as to what constitutes a model within the mess of human reasoning) and I’m sure that even if it does, it’s not straightforward.
  
  since we probably won’t have those kinds of real-time-brain-scanning technologies, right?
  
  One hope would be that, by the time we have those technologies, we’d know what to look for.
  - Steven Byrnes Oct 5, 2021, 4:14 PM
    LW: 2 AF: 1
    AF Parent
    I was writing a kinda long reply but maybe I should first clarify: what do you mean by “model”? Can you give examples of ways that I could learn something (or otherwise change my synapses within a lifetime) that you wouldn’t characterize as “changes to my mental model”? For example, which of the following would be “changes to my mental model”?
    I learn that Brussels is the capital of Belgium
    I learn that it’s cold outside right now
    I taste a new brand of soup and find that I really like it
    I learn to ride a bicycle, including
    maintaining balance via fast hard-to-describe responses where I shift my body in certain ways in response to different sensations and perceptions
    being able to predict how the bicycle and me would move if I swung my arm around
    I didn’t sleep well so now I’m grumpy
    FWIW my inclination is to say that 1-4 are all “changes to my mental model”. And 5 involves both changes to my mental model (knowing that I’m grumpy), and changes to the inputs to my mental model (I feel different “feelings” than I otherwise would—I think of those as inputs going into the model, just like visual inputs go into the model). Is there anything wrong / missing / suboptimal about that definition?
    - Stuart_Armstrong Oct 6, 2021, 9:18 AM
      LW: 2 AF: 2
      AF Parent
      Vertigo, lust, pain reactions, some fear responses, and so on, don’t involve a model. Some versions of “learning that it’s cold outside” don’t involve a model, just looking out and shivering; the model aspect comes in when you start reasoning about what to do about it. People often drive to work without consciously modelling anything on the way.
      
      Think model-based learning versus Q-learning. Anything that’s more Q-learning is not model based.