This one kinda confuses me. I’m of the opinion that the human brain is “constructed with a model explicitly, so that identifying the model is as simple as saying “the model is in this sub-module, the one labelled ‘model’”.” Of course the contents of the model are learned, but I think the question of whether any particular plastic synapse is or is not part of the information content of the model will have a straightforward yes-or-no answer. If that’s right, then “it’s hard to find the model (if any) in a trained model-free RL agent” is a disanalogy to “AIs learning human values”. It would be more analogous to just train a MuZero clone, which has a labeled “model” component, instead of training a model-free RL.
And then looking at weights and activations would also be disanalogous to “AIs learning human values”, since we probably won’t have those kinds of real-time-brain-scanning technologies, right?
I think the question of whether any particular plastic synapse is or is not part of the information content of the model will have a straightforward yes-or-no answer.
I don’t think it has an easy yes or no answer (at least without some thought as to what constitutes a model within the mess of human reasoning) and I’m sure that even if it does, it’s not straightforward.
since we probably won’t have those kinds of real-time-brain-scanning technologies, right?
One hope would be that, by the time we have those technologies, we’d know what to look for.
I was writing a kinda long reply but maybe I should first clarify: what do you mean by “model”? Can you give examples of ways that I could learn something (or otherwise change my synapses within a lifetime) that you wouldn’t characterize as “changes to my mental model”? For example, which of the following would be “changes to my mental model”?
I learn that Brussels is the capital of Belgium
I learn that it’s cold outside right now
I taste a new brand of soup and find that I really like it
I learn to ride a bicycle, including
maintaining balance via fast hard-to-describe responses where I shift my body in certain ways in response to different sensations and perceptions
being able to predict how the bicycle and me would move if I swung my arm around
I didn’t sleep well so now I’m grumpy
FWIW my inclination is to say that 1-4 are all “changes to my mental model”. And 5 involves both changes to my mental model (knowing that I’m grumpy), and changes to the inputs to my mental model (I feel different “feelings” than I otherwise would—I think of those as inputs going into the model, just like visual inputs go into the model). Is there anything wrong / missing / suboptimal about that definition?
Vertigo, lust, pain reactions, some fear responses, and so on, don’t involve a model. Some versions of “learning that it’s cold outside” don’t involve a model, just looking out and shivering; the model aspect comes in when you start reasoning about what to do about it. People often drive to work without consciously modelling anything on the way.
Think model-based learning versus Q-learning. Anything that’s more Q-learning is not model based.
This one kinda confuses me. I’m of the opinion that the human brain is “constructed with a model explicitly, so that identifying the model is as simple as saying “the model is in this sub-module, the one labelled ‘model’”.” Of course the contents of the model are learned, but I think the question of whether any particular plastic synapse is or is not part of the information content of the model will have a straightforward yes-or-no answer. If that’s right, then “it’s hard to find the model (if any) in a trained model-free RL agent” is a disanalogy to “AIs learning human values”. It would be more analogous to just train a MuZero clone, which has a labeled “model” component, instead of training a model-free RL.
And then looking at weights and activations would also be disanalogous to “AIs learning human values”, since we probably won’t have those kinds of real-time-brain-scanning technologies, right?
Sorry if I’m misunderstanding.
I don’t think it has an easy yes or no answer (at least without some thought as to what constitutes a model within the mess of human reasoning) and I’m sure that even if it does, it’s not straightforward.
One hope would be that, by the time we have those technologies, we’d know what to look for.
I was writing a kinda long reply but maybe I should first clarify: what do you mean by “model”? Can you give examples of ways that I could learn something (or otherwise change my synapses within a lifetime) that you wouldn’t characterize as “changes to my mental model”? For example, which of the following would be “changes to my mental model”?
I learn that Brussels is the capital of Belgium
I learn that it’s cold outside right now
I taste a new brand of soup and find that I really like it
I learn to ride a bicycle, including
maintaining balance via fast hard-to-describe responses where I shift my body in certain ways in response to different sensations and perceptions
being able to predict how the bicycle and me would move if I swung my arm around
I didn’t sleep well so now I’m grumpy
FWIW my inclination is to say that 1-4 are all “changes to my mental model”. And 5 involves both changes to my mental model (knowing that I’m grumpy), and changes to the inputs to my mental model (I feel different “feelings” than I otherwise would—I think of those as inputs going into the model, just like visual inputs go into the model). Is there anything wrong / missing / suboptimal about that definition?
Vertigo, lust, pain reactions, some fear responses, and so on, don’t involve a model. Some versions of “learning that it’s cold outside” don’t involve a model, just looking out and shivering; the model aspect comes in when you start reasoning about what to do about it. People often drive to work without consciously modelling anything on the way.
Think model-based learning versus Q-learning. Anything that’s more Q-learning is not model based.