However the end result is just as general. If evolution can create humans which roughly implement the goal of “be fruitful and multiply”, then we could probably create a ULM that implements the goal of “be fruitful and multiply paperclips”.
But how likely are we to create a dangerous paperclipper whilst aiming for something else? How does your model accommodate single -trackedness, incorrigubility, etc.
But how likely are we to create a dangerous paperclipper whilst aiming for something else?
Pretty unlikely, because a paperclipper is a relatively complex—and thus hard to specify—value function. It seems easy only when you think of explicitly programmed goals, rather than the more difficult, highly indirect route of encoding a value function into a ULM.
But to generalize your point, yes there is certainly the possibility that aiming for an externalized version of a human value shaped function could still get you something quite dangerous if you don’t get close enough. A better understanding of the neuro basis of altruism is probably important.
In particular super simple utility functions are easier to implement and thus intrinsically more likely. They also tend to be dangerous.
But to generalize your point, yes there is certainly the possibility that aiming for an externalized version of a human value shaped function could still get you something quite dangerous if you don’t get close enough.
Could you give an example? I have never found that line of argument very convincing. We don’t all have identical value systems, so we are all near misses to each other. I don’t see why a full value system is needed anyway.
A better understanding of the neuro basis of altruism is probably important.
Maybe if you are building an agentive AI..
particular super simple utility functions are easier to implement and thus intrinsically more likely. They also tend to be dangerous.
Does an oracle AI have a simple utility function? Is it dangerous?
Could you give an example? I have never found that line of argument very convincing. We don’t all have identical value systems, so we are all near misses to each other. I don’t see why a full value system is needed anyway.
We have some initial ideas for computable versions of curiosity and controlism (there is not a good word in english for the desire/drive to be in control). They both appear to be simple to specify. Human values are complex but they probably use something like simple curiosity and controlism heuristics as subfeatures.
So a brain-inspired approach could fail if the altruism components don’t work or become de-emphasized later. It could fail if the AI’s circle of empathy/altruism is too small or focused on say an individual (the creator, for example), and the AI then behaves oddly when they die.
At this time I am not aware of a realistic proposal for implementing altruism in a ML based AGI. Maybe it exists and just isn’t well known—if you’ve come across anything send some links.
Maybe if you are building an agentive AI..
Well, yes.
Does an oracle AI have a simple utility function? Is it dangerous?
I do not believe the demand for or potential of oracle AI is remotely comparable to agentive AI. People will want agents to do their bidding, create wealth for them, help them live better, etc.
We have some initial ideas for computable versions of curiosity and controlism (there is not a good word in english for the desire/drive to be in control).
Autonomy? Arguably that’s Greek...
I do not believe the demand for or potential of oracle AI is remotely comparable to agentive AI. People will want agents to do their bidding, create wealth for them, help them live better, etc.
There is clearly a demand for agentive AI, in a sense, because people are already using agents to do their bidding, to achieve specific goals. Those qualifications are important because they distinguish a limited kind of AI, that people would want, from a more powerful kind, that they would not.
The idea of AI as “benevolent” dictator is not appealing to democritically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman rival...so who is motivated to build one?
Yudkowsky seems to think that there is a moral imperative to put an AI in charge of the world, because it would create billions of extra happy human lives, and not creating those lives is the equivalent of mass murder. That is a very unintuitive piece of reasoning, and it therefore cannot stand as a prediction of what AIs will be built, since it does not stand as a prediction about how people will reason morally.
The option of achieving safety by aiming lower...the technique that leads us to have speed limits, rather than struggling to make the faster possible car safe...is still available.
The God AI concept is related to another favourite MIRI theme, the need to instil the whole of human value into an AI, something MIRI admits would be very difficult. .
MIRI makes the methodological proposal that it simplifies the issue of friendliness or morality or safety to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.
Not only are some human values morally relevant, than others some human values are what make humans dangerous to other humans, bordering on existential threat. I would rather not have superintelligent AIs with paranoia , supreme ambition, or tribal loyalty to other AIs in their value system.
So there are good reasons for thinking that installing subsets of human value would be both easier and safer.
Altruism, in particular is not needed for a limited agentive AI. Such AIs would perform specialised tasks, leaving it to humans to stitch the results into something that fulfils their values. We don’t want a Google car that takes us where it guesses we want to go
The idea of AI as “benevolent” dictator is not appealing to democritically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman rival...so who is motivated to build one?
As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous.
Current narrow-AI technology includes HFT algorithms, which make trading decisions within fractions of a second, far too fast to keep humans in the loop. HFT seeks to make a very short-term profit, but even traders looking for a longer-term investment benefit from being faster than their competitors. Market prices are also very effective at incorporating various sources of knowledge [135]. As a consequence, a trading algorithmʼs performance might be improved both by making it faster and by making it more capable of integrating various sources of knowledge. Most advances toward general AGI will likely be quickly taken advantage of in the financial markets, with little opportunity for a human to vet all the decisions. Oracle AIs are unlikely to remain as pure oracles for long.
Similarly, Wallach [283] discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are ‘on the loop’ rather than ‘in the loop’. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robotʼs actions and interfere if something goes wrong.
Human Rights Watch [90] reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computerʼs plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.
In general, any broad domain involving high stakes, adversarial decision making and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection and warfare could plausibly make use of all the intelligence they can get. If oneʼs opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems.
Miller [189] also points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.
Some AGI designers might also choose to create less constrained and more free-acting AGIs for aesthetic or moral reasons, preferring advanced minds to have more freedom.
Similarly, Wallach [283] discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are ‘on the loop’ rather than ‘in the loop’. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robotʼs actions and interfere if something goes wrong.Human Rights Watch [90] reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computerʼs plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.
The weaponisation of AI has indeed already begun, so it is not a danger that needs pointing out. It suits the military to give drones, and so forth, greater autonomy, but it also suits the military to retain overall control....they are not going to build a God AI that is also a weapon, since there is no military mileagei n building a weapon that might attack you out of its own volition. So weaponised AI is limited agentive AI. Since the military want .to retain overall control, they will in effect conduct their own safety research, increasing the controlability of their systems in parallel with their increasing autonomy. MIRIs research is not very relevant to weaponised AI, because MIRI focuses on the hidden dangers of apparently benevolent AI, and on god AIs, powerful singletons.
As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous.
You may be tacitly assuming that an AI is either passive, like Oracle AI , .or dangerously agentive. But we already have agentive AIs that haven’t killed us.
I am making a three way distinction between
Non agentive AI
Limited agentive AI
Maximally agentive AI, .or “God” AI.
Non agentive AI is passive, doing nothing once it has finished processing its current request. It is typified by Oracle AI.
Limited agentive AI performs specific functions, and operates under effective overrides and safety protocols.
(For instance, whilst it would destroy the effectiveness of automated trading software to have a human okaying each trade, it nonetheless has kill switches and sanity checks).
Both are examples of Tool AI. Tool AI can be used to do dangerous things, but the responsibility ultimately falls on the tool us
Maximally agentive AI is not passive by default, and has a wide range if capabilities. It may be in charge of other AIs, or have effectors that allow it to take real world actions directly. Attempts may have been made to add safety features, but their effectiveness would be in doubt...thatis just the hard problem of AI friendliness that MIRI writes so much about.
The contrary view is that there is no need to render God AIs safe technologically, because other is no incentive to build them.(Which does not mean the whole field of AI safety is pointless
ETA
On the other hand you may be distinguishing between limited and maximal agency, but arguing that there is a slippery slope leading from the one to the other. The political analogy shows that people are capable of putting a barrier across the slope: people are generally happy to give some power to some politicians, but resist moves to give all the power to one person.
On the other hand, people might be tempted to give AIs more power once they have a track record of reliability, but a track record of reliability is itself a kind of empirical safety proof.
There is a further argument to the effect that we are gradually giving more autonomy to agentive AIs (without moving entirely away from oracle AIs like Google) , but that gradual increase is being paralelled by an incremental approach to AI safety, for instance in automated trading systems, which have been given both more ability to trade without detailed oversight, and more powerful overrides. Hypothetically, increased autonomy without increased safety measures would mean increased danger, but that is not the case in reality. I am not arguing against AI danger and safety measures overall, I am arguing against a grandiose, all-or-nothing conception of AI safety and danger.
We have some initial ideas for computable versions of curiosity and controlism (there is not a good word in english for the desire/drive to be in control).
Autonomy? Arguably that’s Greek...
I like it.
I do not believe the demand for or potential of oracle AI is remotely comparable to agentive AI. People will want agents to do their bidding, create wealth for them, help them live better, etc.
(Replying to my own text above). On consideration this is wrong—Google is an oracle-AI more or less, and there is high demand for that. The demand for agenty AI is probably much greater, but there is still a role/demand for oracle AI and alot of other stuff in between.
So there are good reasons for thinking that installing subsets of human value would be both easier and safer.
Totally. I think this also goes hand in hand with understanding more about human values—how they evolved, how they are encoded, what is learned or not etc.
Altruism, in particular is not needed for a limited agentive AI. Such AIs would perform specialised tasks, leaving it to humans to stitch the results into something that fulfils their values. We don’t want a Google car that takes us where it guesses we want to go
Of course—there are many niches for more specialized or limited agentive AI, and these designs probably don’t need altruism. That’s important more for the complex general agents, which would control/manage the specialists, narrow AIs, other software, etc.
But how likely are we to create a dangerous paperclipper whilst aiming for something else? How does your model accommodate single -trackedness, incorrigubility, etc.
Pretty unlikely, because a paperclipper is a relatively complex—and thus hard to specify—value function. It seems easy only when you think of explicitly programmed goals, rather than the more difficult, highly indirect route of encoding a value function into a ULM.
But to generalize your point, yes there is certainly the possibility that aiming for an externalized version of a human value shaped function could still get you something quite dangerous if you don’t get close enough. A better understanding of the neuro basis of altruism is probably important.
In particular super simple utility functions are easier to implement and thus intrinsically more likely. They also tend to be dangerous.
Could you give an example? I have never found that line of argument very convincing. We don’t all have identical value systems, so we are all near misses to each other. I don’t see why a full value system is needed anyway.
Maybe if you are building an agentive AI..
Does an oracle AI have a simple utility function? Is it dangerous?
We have some initial ideas for computable versions of curiosity and controlism (there is not a good word in english for the desire/drive to be in control). They both appear to be simple to specify. Human values are complex but they probably use something like simple curiosity and controlism heuristics as subfeatures.
So a brain-inspired approach could fail if the altruism components don’t work or become de-emphasized later. It could fail if the AI’s circle of empathy/altruism is too small or focused on say an individual (the creator, for example), and the AI then behaves oddly when they die.
At this time I am not aware of a realistic proposal for implementing altruism in a ML based AGI. Maybe it exists and just isn’t well known—if you’ve come across anything send some links.
Well, yes.
I do not believe the demand for or potential of oracle AI is remotely comparable to agentive AI. People will want agents to do their bidding, create wealth for them, help them live better, etc.
Autonomy? Arguably that’s Greek...
There is clearly a demand for agentive AI, in a sense, because people are already using agents to do their bidding, to achieve specific goals. Those qualifications are important because they distinguish a limited kind of AI, that people would want, from a more powerful kind, that they would not.
The idea of AI as “benevolent” dictator is not appealing to democritically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman rival...so who is motivated to build one?
Yudkowsky seems to think that there is a moral imperative to put an AI in charge of the world, because it would create billions of extra happy human lives, and not creating those lives is the equivalent of mass murder. That is a very unintuitive piece of reasoning, and it therefore cannot stand as a prediction of what AIs will be built, since it does not stand as a prediction about how people will reason morally.
The option of achieving safety by aiming lower...the technique that leads us to have speed limits, rather than struggling to make the faster possible car safe...is still available.
The God AI concept is related to another favourite MIRI theme, the need to instil the whole of human value into an AI, something MIRI admits would be very difficult. .
MIRI makes the methodological proposal that it simplifies the issue of friendliness or morality or safety to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.
Not only are some human values morally relevant, than others some human values are what make humans dangerous to other humans, bordering on existential threat. I would rather not have superintelligent AIs with paranoia , supreme ambition, or tribal loyalty to other AIs in their value system.
So there are good reasons for thinking that installing subsets of human value would be both easier and safer.
Altruism, in particular is not needed for a limited agentive AI. Such AIs would perform specialised tasks, leaving it to humans to stitch the results into something that fulfils their values. We don’t want a Google car that takes us where it guesses we want to go
From section 5.1.1. of Responses to Catastrophic AGI Risk:
The weaponisation of AI has indeed already begun, so it is not a danger that needs pointing out. It suits the military to give drones, and so forth, greater autonomy, but it also suits the military to retain overall control....they are not going to build a God AI that is also a weapon, since there is no military mileagei n building a weapon that might attack you out of its own volition. So weaponised AI is limited agentive AI. Since the military want .to retain overall control, they will in effect conduct their own safety research, increasing the controlability of their systems in parallel with their increasing autonomy. MIRIs research is not very relevant to weaponised AI, because MIRI focuses on the hidden dangers of apparently benevolent AI, and on god AIs, powerful singletons.
You may be tacitly assuming that an AI is either passive, like Oracle AI , .or dangerously agentive. But we already have agentive AIs that haven’t killed us.
I am making a three way distinction between
Non agentive AI
Limited agentive AI
Maximally agentive AI, .or “God” AI.
Non agentive AI is passive, doing nothing once it has finished processing its current request. It is typified by Oracle AI. Limited agentive AI performs specific functions, and operates under effective overrides and safety protocols. (For instance, whilst it would destroy the effectiveness of automated trading software to have a human okaying each trade, it nonetheless has kill switches and sanity checks). Both are examples of Tool AI. Tool AI can be used to do dangerous things, but the responsibility ultimately falls on the tool us Maximally agentive AI is not passive by default, and has a wide range if capabilities. It may be in charge of other AIs, or have effectors that allow it to take real world actions directly. Attempts may have been made to add safety features, but their effectiveness would be in doubt...thatis just the hard problem of AI friendliness that MIRI writes so much about.
The contrary view is that there is no need to render God AIs safe technologically, because other is no incentive to build them.(Which does not mean the whole field of AI safety is pointless
ETA
On the other hand you may be distinguishing between limited and maximal agency, but arguing that there is a slippery slope leading from the one to the other. The political analogy shows that people are capable of putting a barrier across the slope: people are generally happy to give some power to some politicians, but resist moves to give all the power to one person.
On the other hand, people might be tempted to give AIs more power once they have a track record of reliability, but a track record of reliability is itself a kind of empirical safety proof.
There is a further argument to the effect that we are gradually giving more autonomy to agentive AIs (without moving entirely away from oracle AIs like Google) , but that gradual increase is being paralelled by an incremental approach to AI safety, for instance in automated trading systems, which have been given both more ability to trade without detailed oversight, and more powerful overrides. Hypothetically, increased autonomy without increased safety measures would mean increased danger, but that is not the case in reality. I am not arguing against AI danger and safety measures overall, I am arguing against a grandiose, all-or-nothing conception of AI safety and danger.
I like it.
(Replying to my own text above). On consideration this is wrong—Google is an oracle-AI more or less, and there is high demand for that. The demand for agenty AI is probably much greater, but there is still a role/demand for oracle AI and alot of other stuff in between.
Totally. I think this also goes hand in hand with understanding more about human values—how they evolved, how they are encoded, what is learned or not etc.
Of course—there are many niches for more specialized or limited agentive AI, and these designs probably don’t need altruism. That’s important more for the complex general agents, which would control/manage the specialists, narrow AIs, other software, etc.
That seems to be re introducing God AI. I think people would want to keep humans in the loop. That’s both a prediction, and a means of AI safety.