As Megan grows in capabilities, the default outcome here is full planetary doom as Megan’s self-improvement accelerates itself and she takes control of the universe in the name of protecting Cady from harm, and you should be very scared of exactly what Megan is going to decide that means for both Cady and the universe. Almost all value, quite possibly more than all value, will be lost.
Discount rate. Taking over the [local area, country, world, solar system, universe] are future events that give a distant future reward [Cady doesn’t die]. While committing homicide on the dog or school bully immediately improves life for Cady.
There’s also compute limits. Megan may not have sufficient compute to evaluate such complex decision paths as “I take over the world and in 80 years I get slightly more reward because Cady doesn’t die”. Either by compute limits or by limits on Megan’s internal architecture that simply don’t allow path evaluations past a certain amount of future time.
This succinctly explains crime, btw. A criminal usually acts to discount futures where they are caught and imprisoned, especially in areas where the pCaught is pretty low. So if they think, from talking to their criminal friends, that pCaught is low (“none of my criminal friends were caught yet except one who did crime for 10 years”), then they commit crimes with the local expectation that they won’t be caught for 10 years. The immediate term reward of the benefits of crime exceed the long term penalty when they are caught if discount rate is high.
Megan may not have sufficient compute to evaluate such complex decision paths as “I take over the world and in 80 years I get slightly more reward because Cady doesn’t die”.
Why ever not? I can, and Megan seems pretty clever to me.
Yes but are you say able to model out all the steps and assess if it’s wise? Like you want to keep a loved one alive. Overthrowing the government in a coup is theoretically possible, and then you as dictator spend all tax dollars on medical research.
But the probability of success is so low, and many of those futures get you and that loved one killed through reprisal. Can you assess the likelihood of a path over years with thousands of steps?
Or just do only greedy near term actions.
Of course, power seeking behavior is kinda incremental. You don’t have to plan 80 years into the future. If you get power and money now, you can buy Cady bandaids, etc. Get richer and you can hire bodyguards. And so on—you get immediate near term benefits with each power increase.
A greedy strategy would work actually. The issue becomes when there is a choice to break the rules. Do you evade taxes or steal money? These all have risks. Once you’re a billionaire and have large resources, do you start illegally making weapons? There are all these branch points where if the risk of getting caught is high, the AI won’t do these things.
As Megan grows in capabilities, the default outcome here is full planetary doom as Megan’s self-improvement accelerates itself and she takes control of the universe in the name of protecting Cady from harm, and you should be very scared of exactly what Megan is going to decide that means for both Cady and the universe. Almost all value, quite possibly more than all value, will be lost.
Discount rate. Taking over the [local area, country, world, solar system, universe] are future events that give a distant future reward [Cady doesn’t die]. While committing homicide on the dog or school bully immediately improves life for Cady.
There’s also compute limits. Megan may not have sufficient compute to evaluate such complex decision paths as “I take over the world and in 80 years I get slightly more reward because Cady doesn’t die”. Either by compute limits or by limits on Megan’s internal architecture that simply don’t allow path evaluations past a certain amount of future time.
This succinctly explains crime, btw. A criminal usually acts to discount futures where they are caught and imprisoned, especially in areas where the pCaught is pretty low. So if they think, from talking to their criminal friends, that pCaught is low (“none of my criminal friends were caught yet except one who did crime for 10 years”), then they commit crimes with the local expectation that they won’t be caught for 10 years. The immediate term reward of the benefits of crime exceed the long term penalty when they are caught if discount rate is high.
Why ever not? I can, and Megan seems pretty clever to me.
Yes but are you say able to model out all the steps and assess if it’s wise? Like you want to keep a loved one alive. Overthrowing the government in a coup is theoretically possible, and then you as dictator spend all tax dollars on medical research.
But the probability of success is so low, and many of those futures get you and that loved one killed through reprisal. Can you assess the likelihood of a path over years with thousands of steps?
Or just do only greedy near term actions.
Of course, power seeking behavior is kinda incremental. You don’t have to plan 80 years into the future. If you get power and money now, you can buy Cady bandaids, etc. Get richer and you can hire bodyguards. And so on—you get immediate near term benefits with each power increase.
A greedy strategy would work actually. The issue becomes when there is a choice to break the rules. Do you evade taxes or steal money? These all have risks. Once you’re a billionaire and have large resources, do you start illegally making weapons? There are all these branch points where if the risk of getting caught is high, the AI won’t do these things.