This is very basic/fundamental compared to many questions in this thread, but I am taking ‘all dumb questions allowed’ hyper-literally, lol. I have little technical background and though I’ve absorbed some stuff about AI safety by osmosis, I’ve only recently been trying to dig deeper into it (and there’s lots of basic/fundamental texts I haven’t read).
Writers on AGI often talk about AGI in anthropomorphic terms—they talk about it having ‘goals’, being an ‘agent’, ‘thinking’ ‘wanting’, ‘rewards’ etc. As I understand it, most AI researchers don’t think that AIs will have human-style qualia, sentience, or consciousness.
But if AI don’t have qualia/sentience, how can they ‘want things’ ‘have goals’ ‘be rewarded’, etc? (since in humans, these things seem to depend on our qualia, and specifically our ability to feel pleasure and pain).
I first realised that I was confused about this when reading Richard Ngo’s introduction to AI safety and he was talking about reward functions and reinforcement learning. I realised that I don’t understand how reinforcement learning works in machines. I understand how it works in humans and other animals—give the animal something pleasant when it does the desired behaviour and/or painful when it does the bad behaviour. But how can you make a machine without qualia “feel” pleasure or pain?
When I talked to some friends about this, I came to the conclusion that this is just a subset of ‘not knowing how computers work’, and it might be addressed by me getting more knowledge about how computers work (on a hardware, or software-communicating-with-hardware, level). But I’m interested in people’s answers here.
Assume you have a very simple reinforcement learning AI that does nothing but chooses between two actions, A and B. And it has a goal of “maximizing reward”. “Reward”, in this case, doesn’t correspond to any qualia; rather “reward” is just a number that results from the AI choosing a particular action. So what “maximize reward” actually means in this context is “choose the action that results in the biggest numbers”.
Say that the AI is programmed to initially just try choosing A ten times in a row and B ten times in a row.
When the AI chooses A, it is shown the following numbers: 1, 2, 2, 1, 2, 2, 1, 1, 1, 2 (total 15).
When the AI chooses B, it is shown the following numbers: 4, 3, 4, 5, 3, 4, 2, 4, 3, 2 (total 34).
After the AI has tried both actions ten times, it is programmed to choose its remaining actions according to the rule “choose the action that has historically had the bigger total”. Since action B has had the bigger total, it then proceeds to always choose B.
To achieve this, we don’t need to build the AI to have qualia, we just need to be able to build a system that implements a rule like “when the total for action A is greater than the total for action B, choose A, and vice versa; if they’re both equal, pick one at random”.
When we say that an AI “is rewarded”, we just mean “the AI is shown bigger numbers, and it has been programmed to act in ways that result in it being shown bigger numbers”.
We talk about the AI having “goals” and “wanting” things by an application of the intentional stance. That’s Daniel Dennett’s term for the idea that, even if a chess-playing AI had a completely different motivational system than humans do (and chess-playing AIs do have that), we could talk about it having a “goal” of “wanting” to win at chess. If we assume that the AI “wants” to win the chess, then we can make more accurate predictions of its behavior—for instance, we can assume that it won’t make moves that are obviously losing moves if it can avoid them.
What’s actually going on is that the chess AI has been programmed with rules like “check whether a possible move would lead to losing the game and if so, try to find another move to play instead”. There’s no “wanting” in the human sense going on, but it still acts in the kind of a way that a human would act, if that human wanted to win a game of chess. So saying that the AI “wants” to win the game is a convenient shorthand for “the AI is programmed to play the kinds of moves that are more likely to lead it to win the game, within the limits of its ability to predict the likely outcomes of those moves”.
At a very high level, the way reinforcement learning works is that the AI attempts to maximise a reward function. This reward function can be summed up as “The sum of all rewards you expect to get in the future”. So using a bunch of maths, the AI looks at the rewards it’s got in the past, the rewards it expects to get in the future, and selects the action that maximises the expected future rewards. The reward function can be defined within the algorithm itself, or come from the environment. For instance, if you want to train a four-legged robot to learn to walk, the reward might be the distance travelled in a certain direction. If you want to train it to play an Atari game, the reward is usually the score.
None of this requires any sort of qualia, or for the agent to want things. It’s a mathematical equation. AI behaves in the way it behaves as a result of the algorithm attempting to maximise it, and the AI can be said to “want” to maximise its reward function or “have the goal of” maximising its reward function because it reliably takes actions to move towards this outcome if it’s a good enough AI.
Reinforcement Learning is easy to conceptualize. The key missing ingredient is that we explicitly specify algorithms to maximize the reward. So this is disanalogous to humans: to train your 5yo, you need only give the reward and the 5yo may adapt their behavior because they value the reward; in a reinforcement learning agent, the second step only occurs because we make it occur. You could just as well flip the algorithm to pursue minimal rewards instead.
I think my question is deeper—why do machines ‘want’ or ‘have a goal to’ follow the algorithm to maximize reward? How can machines ‘find stuff rewarding’?
As far as current systems are concerned, the answer is that (as far as anyone knows) they don’t find things rewarding or want things. But they can still run a search to optimize a training signal, and that gives you an agent.
This is very basic/fundamental compared to many questions in this thread, but I am taking ‘all dumb questions allowed’ hyper-literally, lol. I have little technical background and though I’ve absorbed some stuff about AI safety by osmosis, I’ve only recently been trying to dig deeper into it (and there’s lots of basic/fundamental texts I haven’t read).
Writers on AGI often talk about AGI in anthropomorphic terms—they talk about it having ‘goals’, being an ‘agent’, ‘thinking’ ‘wanting’, ‘rewards’ etc. As I understand it, most AI researchers don’t think that AIs will have human-style qualia, sentience, or consciousness.
But if AI don’t have qualia/sentience, how can they ‘want things’ ‘have goals’ ‘be rewarded’, etc? (since in humans, these things seem to depend on our qualia, and specifically our ability to feel pleasure and pain).
I first realised that I was confused about this when reading Richard Ngo’s introduction to AI safety and he was talking about reward functions and reinforcement learning. I realised that I don’t understand how reinforcement learning works in machines. I understand how it works in humans and other animals—give the animal something pleasant when it does the desired behaviour and/or painful when it does the bad behaviour. But how can you make a machine without qualia “feel” pleasure or pain?
When I talked to some friends about this, I came to the conclusion that this is just a subset of ‘not knowing how computers work’, and it might be addressed by me getting more knowledge about how computers work (on a hardware, or software-communicating-with-hardware, level). But I’m interested in people’s answers here.
Assume you have a very simple reinforcement learning AI that does nothing but chooses between two actions, A and B. And it has a goal of “maximizing reward”. “Reward”, in this case, doesn’t correspond to any qualia; rather “reward” is just a number that results from the AI choosing a particular action. So what “maximize reward” actually means in this context is “choose the action that results in the biggest numbers”.
Say that the AI is programmed to initially just try choosing A ten times in a row and B ten times in a row.
When the AI chooses A, it is shown the following numbers: 1, 2, 2, 1, 2, 2, 1, 1, 1, 2 (total 15).
When the AI chooses B, it is shown the following numbers: 4, 3, 4, 5, 3, 4, 2, 4, 3, 2 (total 34).
After the AI has tried both actions ten times, it is programmed to choose its remaining actions according to the rule “choose the action that has historically had the bigger total”. Since action B has had the bigger total, it then proceeds to always choose B.
To achieve this, we don’t need to build the AI to have qualia, we just need to be able to build a system that implements a rule like “when the total for action A is greater than the total for action B, choose A, and vice versa; if they’re both equal, pick one at random”.
When we say that an AI “is rewarded”, we just mean “the AI is shown bigger numbers, and it has been programmed to act in ways that result in it being shown bigger numbers”.
We talk about the AI having “goals” and “wanting” things by an application of the intentional stance. That’s Daniel Dennett’s term for the idea that, even if a chess-playing AI had a completely different motivational system than humans do (and chess-playing AIs do have that), we could talk about it having a “goal” of “wanting” to win at chess. If we assume that the AI “wants” to win the chess, then we can make more accurate predictions of its behavior—for instance, we can assume that it won’t make moves that are obviously losing moves if it can avoid them.
What’s actually going on is that the chess AI has been programmed with rules like “check whether a possible move would lead to losing the game and if so, try to find another move to play instead”. There’s no “wanting” in the human sense going on, but it still acts in the kind of a way that a human would act, if that human wanted to win a game of chess. So saying that the AI “wants” to win the game is a convenient shorthand for “the AI is programmed to play the kinds of moves that are more likely to lead it to win the game, within the limits of its ability to predict the likely outcomes of those moves”.
Is it intuitive to you why a calculator can sum numbers even though it doesn’t want/feel anything?
If so, and if an AGI still feels confusing, could you help me pin point the difference and I’ll continue from there?
( +1 for the question!)
Functionally. You can regard them all as form of behaviour.
do they depend on qualia, or are they just accompanied by qualia?
This might be a crux, because I’m inclined to think they depend on qualia.
Why does AI ‘behave’ in that way? How do engineers make it ‘want’ to do things?
At a very high level, the way reinforcement learning works is that the AI attempts to maximise a reward function. This reward function can be summed up as “The sum of all rewards you expect to get in the future”. So using a bunch of maths, the AI looks at the rewards it’s got in the past, the rewards it expects to get in the future, and selects the action that maximises the expected future rewards. The reward function can be defined within the algorithm itself, or come from the environment. For instance, if you want to train a four-legged robot to learn to walk, the reward might be the distance travelled in a certain direction. If you want to train it to play an Atari game, the reward is usually the score.
None of this requires any sort of qualia, or for the agent to want things. It’s a mathematical equation. AI behaves in the way it behaves as a result of the algorithm attempting to maximise it, and the AI can be said to “want” to maximise its reward function or “have the goal of” maximising its reward function because it reliably takes actions to move towards this outcome if it’s a good enough AI.
Reinforcement Learning is easy to conceptualize. The key missing ingredient is that we explicitly specify algorithms to maximize the reward. So this is disanalogous to humans: to train your 5yo, you need only give the reward and the 5yo may adapt their behavior because they value the reward; in a reinforcement learning agent, the second step only occurs because we make it occur. You could just as well flip the algorithm to pursue minimal rewards instead.
Thanks!
I think my question is deeper—why do machines ‘want’ or ‘have a goal to’ follow the algorithm to maximize reward? How can machines ‘find stuff rewarding’?
As far as current systems are concerned, the answer is that (as far as anyone knows) they don’t find things rewarding or want things. But they can still run a search to optimize a training signal, and that gives you an agent.