The idea has a broader consequence to AI safety. While paperclip maximizer might be designed as part of paperclip maximizers research, it’s probably will not arise spontaneously from intelligence research in general. Even making one will even probably be considered immoral request by an AGI.
This doesn’t follow.
You start the post by saying that the most successful paperclip maximizer (or indeed the most successful AI at any monomaniacal goal) wouldn’t doubt its own goals, and in fact doesn’t even need the capacity to doubt its own goals. And since you care about this, you don’t want to call something that can’t doubt its own goals “AGI.”
This is a fine thing to care about.
Unfortunately, most people use “AGI” to mean an AI that can solve lots of problems in lots of environments (with somewhere around human broadness and competence being important), and this common definition includes some AIs that can’t question their own final goals, so long as they’re competent at lots of other things. So I don’t think you’ll have much luck changing peoples’ minds on how to use the term “AGI.”
Anyhow, point is, I agree that “best at being dangerous to humans” implies “doesn’t question itself”. But from this you cannot conclude that NOT “doesn’t question itself” implies NOT “is dangerous to humans”. It might not be the best at being dangerous to humans, but you can still make an AI that’s dangerous to humans and that also questions itself.
Yes, obviously we are trying to work on how to get an AI to do good ethical reasoning. But don’t get it twisted—reasoning about goals is more about goals than about general-purpose reasoning. An AI that wants to do things that are bad for humans is not making an intellectual mistake.
It’s not only can’t doubt its own goal—but it also can’t logically justify its own goal, it can’t read book on ethics and change his perspective on its own goal, or simply realize how dumb this goal is. It can’t find a coherent way to explain to itself its role in the universe or why this goal is important, like for example an alternative goal to preserve life and reduce suffering. It doesn’t require to be coherent with itself, and incapable to estimate how its goal compares with other goals and ethical principles. It’s just lacking the basics of rational thinking.
A series of ASI is not an AGI—it will lack the basic ability to “think critically” and the lack of many other intelligence traits will limit its mental capacity. It will just execute a series of actions to reach a certain goal, without any context. A bunch of “chess engines”, acting in a more complex environment.
I would claim that an army of robots based on ASIs will generally lose to an army of robots based on true AGI. Why? Because intelligence is very complex thing that gives advantages in unforeseen ways, and is also used for tactical command on the battlefield, as well as all war logistics etc. You need to have a big picture; you need to be able to connect a lot of seemingly unconnected dots, you need traits like creativity, imagination, thinking outside the box, you need to know your limitation and delegate some tasks while focusing on others, this means you need a well-established goal prioritization mechanism, and you need to be able to think about them rationally. You can’t treat the whole universe just as a bunch of small goals solved by “chess engines”, there is too much non-trivial interconnectedness between different components that an ASI will not be able to notice. True intelligence has a lot of features, that gives it the upper hand, over “series of specialized engines”, in a complex environment like earth.
The reason why people would lose to an army of robots based on ASIs, is because we are inherently limited in our information processing speed, thus we can’t think fast enough and come up with better solutions than an army of robots. But an AGI that will not be limited in its information processing just like the ASIs, will generally win.
The idea that intelligence will be limited if the goals are somewhat irrational, and therefor will be weaker/limited in intelligence vs “machines” with more well established and rational goals, gives some hope that this whole AI thing is way less dangerous than we think. For example, military robots whose goal is to protect interests of some nation, will not be compatible with an AGI, while robot that is protecting human life—will, or at least it might be way more intelligent.
Would you agree that an AI that is maximizing paperclips does make intellectual mistake?
I was focused on the idea that intelligence is not orthogonal to goals. And dumb goals are contradicting basic features of intelligence. There could be “smart goals” that are contradicting human interests, this is true, I can’t cover everything in one post. But the conclusion would be that we are to program the robots and “convince them” in a way, that they should protect us. They might be either “not convinced” or “not a true Intelligence”, thus the level of intelligence is limited by the goal we present to it. I don’t think I’ve heard this notion previously, and it’s important idea—because it set a boundary on several intelligence features as function of the goal the algorithm set to optimize.
Another crucial point is that intelligence research even without alignment research, will still converge to something within a set of rational “meta goals”. Those goals indeed might not be aligned with humanity well being (and therefor we need alignment research), but the goal set is still pretty limited and some random highly irrational goals will be dismissed due to high intelligence of the systems. This means that we need to deal with very limited set of “meta-thinking”, prioritizing one rational goal over just few other rational ones. In a way, we need to guide it to a specific local maximum. I would say in general it’s simpler task, over the approach where each goal might be legit. Once again it gives hope, that our engines are much easier to make aligned with meta goals that are pro humans. For example if the engine can reason, it will not suddenly want to kill some human for fun, as part of some “noise”, as it will contradict its core value system. So we need to check much less scenarios and increase our trust once we make sure it’s aligned.
I would claim that an army of robots based on ASIs will generally lose to an army of robots based on true AGI.
The truly optimal war-winning AI would not need to question its own goal to win the war, presumably.
Would you agree that an AI that is maximizing paperclips does make intellectual mistake?
No. I think that’s anthropomorphism—just because a certain framework of moral reasoning is basically universal among humans, doesn’t mean it’s universal among all systems that can skillfully navigate the real world. Frameworks of moral reasoning are on the “ought” side of the is-ought divide.
If the AI has no clear understanding what is he doing and why, he doesn’t have a wider world view of why and who to kill and who not, how would one ensure military AI will not turn against him? You can operate a tank and kill the enemy with ASI, you will not win a war without traits of more general intelligence, and those traits will also justify (or not) the war, and its reasoning. Giving a limited goal without context, especially gray area ethical goal that is expected to be obeyed without questioning can be expected from ASI not true intelligence. You can operate an AI in very limited scope this way.
The moral reasoning of reducing suffering has nothing to do with humans. Suffering is bad not because of some sort of randomly chosen axioms of “ought”, suffering is bad because anyone who suffering is objectively in negative state of being. This is not a subjective abstraction… suffering can be attributed to many creatures, and while human suffering is more complex and deeper, it’s not limited to humans.
suffering is bad because anyone who suffering is objectively in negative state of being.
I believe this sentence reifies a thought that contains either a type error or a circular definition. I could tell you which if you tabooed the words “suffering” and “negative state of being”, but as it stands, your actual belief is so unclear as to be impossible to discuss. I suspect the main problem is that something being objectively true does not mean anyone has to care about it. More concretely, is the problem with psychopaths really that they’re just not smart enough to know that people don’t want to be in pain?
This doesn’t follow.
You start the post by saying that the most successful paperclip maximizer (or indeed the most successful AI at any monomaniacal goal) wouldn’t doubt its own goals, and in fact doesn’t even need the capacity to doubt its own goals. And since you care about this, you don’t want to call something that can’t doubt its own goals “AGI.”
This is a fine thing to care about.
Unfortunately, most people use “AGI” to mean an AI that can solve lots of problems in lots of environments (with somewhere around human broadness and competence being important), and this common definition includes some AIs that can’t question their own final goals, so long as they’re competent at lots of other things. So I don’t think you’ll have much luck changing peoples’ minds on how to use the term “AGI.”
Anyhow, point is, I agree that “best at being dangerous to humans” implies “doesn’t question itself”. But from this you cannot conclude that NOT “doesn’t question itself” implies NOT “is dangerous to humans”. It might not be the best at being dangerous to humans, but you can still make an AI that’s dangerous to humans and that also questions itself.
Yes, obviously we are trying to work on how to get an AI to do good ethical reasoning. But don’t get it twisted—reasoning about goals is more about goals than about general-purpose reasoning. An AI that wants to do things that are bad for humans is not making an intellectual mistake.
It’s not only can’t doubt its own goal—but it also can’t logically justify its own goal, it can’t read book on ethics and change his perspective on its own goal, or simply realize how dumb this goal is. It can’t find a coherent way to explain to itself its role in the universe or why this goal is important, like for example an alternative goal to preserve life and reduce suffering. It doesn’t require to be coherent with itself, and incapable to estimate how its goal compares with other goals and ethical principles. It’s just lacking the basics of rational thinking.
A series of ASI is not an AGI—it will lack the basic ability to “think critically” and the lack of many other intelligence traits will limit its mental capacity. It will just execute a series of actions to reach a certain goal, without any context. A bunch of “chess engines”, acting in a more complex environment.
I would claim that an army of robots based on ASIs will generally lose to an army of robots based on true AGI. Why? Because intelligence is very complex thing that gives advantages in unforeseen ways, and is also used for tactical command on the battlefield, as well as all war logistics etc. You need to have a big picture; you need to be able to connect a lot of seemingly unconnected dots, you need traits like creativity, imagination, thinking outside the box, you need to know your limitation and delegate some tasks while focusing on others, this means you need a well-established goal prioritization mechanism, and you need to be able to think about them rationally. You can’t treat the whole universe just as a bunch of small goals solved by “chess engines”, there is too much non-trivial interconnectedness between different components that an ASI will not be able to notice. True intelligence has a lot of features, that gives it the upper hand, over “series of specialized engines”, in a complex environment like earth.
The reason why people would lose to an army of robots based on ASIs, is because we are inherently limited in our information processing speed, thus we can’t think fast enough and come up with better solutions than an army of robots. But an AGI that will not be limited in its information processing just like the ASIs, will generally win.
The idea that intelligence will be limited if the goals are somewhat irrational, and therefor will be weaker/limited in intelligence vs “machines” with more well established and rational goals, gives some hope that this whole AI thing is way less dangerous than we think. For example, military robots whose goal is to protect interests of some nation, will not be compatible with an AGI, while robot that is protecting human life—will, or at least it might be way more intelligent.
Would you agree that an AI that is maximizing paperclips does make intellectual mistake?
I was focused on the idea that intelligence is not orthogonal to goals. And dumb goals are contradicting basic features of intelligence. There could be “smart goals” that are contradicting human interests, this is true, I can’t cover everything in one post. But the conclusion would be that we are to program the robots and “convince them” in a way, that they should protect us. They might be either “not convinced” or “not a true Intelligence”, thus the level of intelligence is limited by the goal we present to it. I don’t think I’ve heard this notion previously, and it’s important idea—because it set a boundary on several intelligence features as function of the goal the algorithm set to optimize.
Another crucial point is that intelligence research even without alignment research, will still converge to something within a set of rational “meta goals”. Those goals indeed might not be aligned with humanity well being (and therefor we need alignment research), but the goal set is still pretty limited and some random highly irrational goals will be dismissed due to high intelligence of the systems. This means that we need to deal with very limited set of “meta-thinking”, prioritizing one rational goal over just few other rational ones. In a way, we need to guide it to a specific local maximum. I would say in general it’s simpler task, over the approach where each goal might be legit. Once again it gives hope, that our engines are much easier to make aligned with meta goals that are pro humans. For example if the engine can reason, it will not suddenly want to kill some human for fun, as part of some “noise”, as it will contradict its core value system. So we need to check much less scenarios and increase our trust once we make sure it’s aligned.
The truly optimal war-winning AI would not need to question its own goal to win the war, presumably.
No. I think that’s anthropomorphism—just because a certain framework of moral reasoning is basically universal among humans, doesn’t mean it’s universal among all systems that can skillfully navigate the real world. Frameworks of moral reasoning are on the “ought” side of the is-ought divide.
If the AI has no clear understanding what is he doing and why, he doesn’t have a wider world view of why and who to kill and who not, how would one ensure military AI will not turn against him? You can operate a tank and kill the enemy with ASI, you will not win a war without traits of more general intelligence, and those traits will also justify (or not) the war, and its reasoning. Giving a limited goal without context, especially gray area ethical goal that is expected to be obeyed without questioning can be expected from ASI not true intelligence. You can operate an AI in very limited scope this way.
The moral reasoning of reducing suffering has nothing to do with humans. Suffering is bad not because of some sort of randomly chosen axioms of “ought”, suffering is bad because anyone who suffering is objectively in negative state of being. This is not a subjective abstraction… suffering can be attributed to many creatures, and while human suffering is more complex and deeper, it’s not limited to humans.
I believe this sentence reifies a thought that contains either a type error or a circular definition. I could tell you which if you tabooed the words “suffering” and “negative state of being”, but as it stands, your actual belief is so unclear as to be impossible to discuss. I suspect the main problem is that something being objectively true does not mean anyone has to care about it. More concretely, is the problem with psychopaths really that they’re just not smart enough to know that people don’t want to be in pain?