Where is “human values” in this model? If we give this model to an AI which wants to learn human values and have full access to human brain, where it should search for human values?
If cortical algorithm will be replaced with GPT-N in some human mind model, will the whole system work?
Well, all the models in the frontal lobe get, let’s call it, reward-prediction points (see my comment here), which feels like positive vibes or something.
If the generative model “I eat a cookie” has lots of reward-prediction points (including the model itself and the downstream models that get activated by it in turn), we describe that as “I want to eat a cookie”.
Likewise If the generative model “Michael Jackson” has lots of reward prediction points, we describe that as “I like Michael Jackson. He’s a great guy.”. (cf. “halo effect”)
If somebody says that justice is one of their values, I think it’s at least partly (and maybe primarily) up a level in meta-cognition. It’s not just that there’s a generative model “justice” and it has lots of reward-prediction points (“justice is good”), but there’s also a generative model of yourself valuing justice, and that has lots of reward-prediction points too. That feels like “When I think of myself as the kind of person who values justice, it’s a pleasing thought”, and “When I imagine other people saying that I’m a person who values justice, it’s a pleasing thought”.
This isn’t really answering your question of what human values are or should be—this is me saying a little bit about what happens behind the scenes when you ask someone “What are your values?”. Maybe they’re related, or maybe not. This is a philosophy question. I don’t know.
If cortical algorithm will be replaced with GPT-N in some human mind model, will the whole system work?
My belief (see post here) is that GPT-N is running a different kind of algorithm, but learning to imitate some steps of the brain algorithm (including neocortex and subcortex and the models that result from a lifetime of experience, and even hormones, body, etc.—after all, the next-token-prediction task is the whole input-output profile, not just the neocortex.) in a deep but limited way. I can’t think of a way to do what you suggest, but who knows.
I reread the post and have some more questions:
Where is “human values” in this model? If we give this model to an AI which wants to learn human values and have full access to human brain, where it should search for human values?
If cortical algorithm will be replaced with GPT-N in some human mind model, will the whole system work?
Well, all the models in the frontal lobe get, let’s call it, reward-prediction points (see my comment here), which feels like positive vibes or something.
If the generative model “I eat a cookie” has lots of reward-prediction points (including the model itself and the downstream models that get activated by it in turn), we describe that as “I want to eat a cookie”.
Likewise If the generative model “Michael Jackson” has lots of reward prediction points, we describe that as “I like Michael Jackson. He’s a great guy.”. (cf. “halo effect”)
If somebody says that justice is one of their values, I think it’s at least partly (and maybe primarily) up a level in meta-cognition. It’s not just that there’s a generative model “justice” and it has lots of reward-prediction points (“justice is good”), but there’s also a generative model of yourself valuing justice, and that has lots of reward-prediction points too. That feels like “When I think of myself as the kind of person who values justice, it’s a pleasing thought”, and “When I imagine other people saying that I’m a person who values justice, it’s a pleasing thought”.
This isn’t really answering your question of what human values are or should be—this is me saying a little bit about what happens behind the scenes when you ask someone “What are your values?”. Maybe they’re related, or maybe not. This is a philosophy question. I don’t know.
My belief (see post here) is that GPT-N is running a different kind of algorithm, but learning to imitate some steps of the brain algorithm (including neocortex and subcortex and the models that result from a lifetime of experience, and even hormones, body, etc.—after all, the next-token-prediction task is the whole input-output profile, not just the neocortex.) in a deep but limited way. I can’t think of a way to do what you suggest, but who knows.