I asked a similar question sometime ago. The strongest counterargument offered was that a scope-limited AI doesn’t stop rogue unfriendly AIs from arising and destroying the world.
The strongest counterargument offered was that a scope-limited AI doesn’t stop rogue unfriendly AIs from arising and destroying the world.
Maybe I misinterpreted the argument. If it means that we need an unbounded friendly AI to deal with unbounded unfriendly AI, it makes more sense. The question then comes down to how likely it is that once someone discovered AGI, others will be able to discover it as well or make use of the discovery, versus the payoff from experimenting with bounded versions of such an AGI design before running an unbounded friendly version. In other words, how much can we increase our confidence that we solved friendliness by experimenting with bounded versions, versus the risk associated with not taking over the world as soon as possible to impede unfriendly unbounded versions.
The strongest counterargument offered was that a scope-limited AI doesn’t stop rogue unfriendly AIs from arising and destroying the world.
I don’t quite understand that argument, maybe someone could elaborate.
If there is a rule that says ‘optimize X for X seconds’ why would an AGI make a difference between ‘optimize X’ and ‘for X seconds’? In other words, why is it assumed that we can succeed to create a paperclip maximizer that cares strongly enough about the design parameters of paperclips to consume the universe (why would it do that as long as it isn’t told to do so) but somehow ignores all design parameters that have to do with spatio-temporal scope boundaries or resource limitations?
I see that there is a subset of unfriendly AGI designs that would never halt, or destroy humanity while pursuing their goals. But how large is that subset, how many do actually halt or proceed very slowly?
If there is a rule that says ‘optimize X for X seconds’ why would an AGI make a difference between ‘optimize X’ and ‘for X seconds’?
I does seem like you misinterpreted the argument, but one possible failure there is if the most effective way to maximize paperclips within the time period is to build paperclip-making Von Neumann machines. If it designs the machines from scratch, it won’t build a time limit into them because that won’t increase the production of paperclips within the period of time it cares about.
If there is a rule that says ‘optimize X for X seconds’ why would an AGI make a difference between ‘optimize X’ and ‘for X seconds’? In other words, why is it assumed that we can succeed to create a paperclip maximizer that cares strongly enough about the design parameters of paperclips to consume the universe (why would it do that as long as it isn’t told to do so) but somehow ignores all design parameters that have to do with spatio-temporal scope boundaries or resource limitations?
The first problem associated with switching such an agent off is specifying exactly what needs to be switched off to count as the agent being being in an “off” state. This is the problem of the agent’s identity. Humans have an intuitive sense of their own identity, and the concept usually deliniates a fleshy sack surrounded by skin. However, phenotypes extend beyond that—as Richard Dawkins pointed out in his book, The Extended Phenotype.
For a machine intelligence, the problem is a thorny one. Machines may construct other machines, and set these to work. They may sub-contract their activities to other agents. Telling a machine to turn itself off and then being faced with an army of its minions and hired help still keen to perform the machine’s original task is an example of how this problem might manifest istelf.
I don’t quite understand that argument, maybe someone could elaborate.
I think the idea is that if I make a perfectly safe AI by constraining it in some way, that doesn’t prevent someone else from making an unsafe AI and killing us all.
I asked a similar question sometime ago. The strongest counterargument offered was that a scope-limited AI doesn’t stop rogue unfriendly AIs from arising and destroying the world.
See also: Neutral AI and the maximizer vs satisficer discussion.
Maybe I misinterpreted the argument. If it means that we need an unbounded friendly AI to deal with unbounded unfriendly AI, it makes more sense. The question then comes down to how likely it is that once someone discovered AGI, others will be able to discover it as well or make use of the discovery, versus the payoff from experimenting with bounded versions of such an AGI design before running an unbounded friendly version. In other words, how much can we increase our confidence that we solved friendliness by experimenting with bounded versions, versus the risk associated with not taking over the world as soon as possible to impede unfriendly unbounded versions.
I don’t quite understand that argument, maybe someone could elaborate.
If there is a rule that says ‘optimize X for X seconds’ why would an AGI make a difference between ‘optimize X’ and ‘for X seconds’? In other words, why is it assumed that we can succeed to create a paperclip maximizer that cares strongly enough about the design parameters of paperclips to consume the universe (why would it do that as long as it isn’t told to do so) but somehow ignores all design parameters that have to do with spatio-temporal scope boundaries or resource limitations?
I see that there is a subset of unfriendly AGI designs that would never halt, or destroy humanity while pursuing their goals. But how large is that subset, how many do actually halt or proceed very slowly?
(I wrote this before seeing timtyler’s post.)
I does seem like you misinterpreted the argument, but one possible failure there is if the most effective way to maximize paperclips within the time period is to build paperclip-making Von Neumann machines. If it designs the machines from scratch, it won’t build a time limit into them because that won’t increase the production of paperclips within the period of time it cares about.
I discuss the associated problems here:
I think the idea is that if I make a perfectly safe AI by constraining it in some way, that doesn’t prevent someone else from making an unsafe AI and killing us all.