I don’t know anyone in the community who’d say it’s a bad thing that leads to extinction if a CEV-aligned superintelligence grabs control.
Hi there. I am a member of the community, and I expect that any plan that looks like “some people build a system that they believe to be a CEV-aligned superintelligence and tell it to seize control” will end in a way that is worse and different than “utopia”. If the “seize control” strategy is aggressive enough, “extinction” is one of the “worse and different” outcomes that is on the table, though I expect that the modal outcome is more like “something dumb doesn’t work, and the plan fails before anything of particular note has even happened”, and the 99th percentile large effect outcome is something like “your supposed CEV-aligned superintelligence breaks something important enough that people notice you were trying to seize control of the world, and then something dumb doesn’t work”.
Note that evolution has had “white-box” access to our architecture, optimising us for inclusive genetic fitness, and getting something that optimizes for similar collections of things.
Would you mind elaborating on what exactly you mean by the terms “white-box” and “optimizing for”, in the above statement (and, particularly, whether you mean the same thing by your first usage of “optimizing” and your second usage).
I think the argument would be clearer if it distinguished between the following meanings of the term “optimizer”:
Deliberative Maximization: We can make reasonable predictions of what this system will do by assuming that it contains an internal model of the value of world states, the effect of its own actions on world states, and further assuming that it will choose whatever action its internal model says will maximize the value of the resulting world state. Concretely, a value network + MCTS based chess engine would fit this definition.
Selective Shaping: This system is the result of an iterative selection process in which certain behaviors resulted in the system being more likely to be selected. As such, we expect that the system will exhibit similar behaviors to those that resulted in it being selected in the past. An example might be m. septendecula cicadas, which breed every 17 years, because there are enough cicadas that come out every 17 years that predators get too full to eat them all, and so a cicada that comes out on the 17 year cycle is likely to survive and breed, while one that comes out at 16 or 18 years is likely to be eaten. Evolution is “optimizing for” cicadas that hatch every 17 years, but the individual cicadas aren’t “optimizing for” much of anything.
Adaptive Steering: If you take an e. coli, and you drop it in a solution that contains variable concentrations of nutrients, you will find that it alternates between two forms of motion: “running”, in which it travels in approximately in a straight line at a constant speed, and “tumbling”, where it randomly changes its direction. When the nutrient density is high, it will tumble rarely, and if the nutrient density is low, it will tumble often. In this manner, it will tend to maintain its heading when conditions are improving, and change its heading when conditions are worsening (good animation here). This actually does look like the e. coli is “optimizing for” being in a high nutrient density region, but it is “optimizing for” that goal in a different way than evolution is “optimizing for” it exhibiting that behavior.
So if we use those terms, the traditional IGF argument looks something like this:
Evolution optimized (selective shaping) humans to be reproductively successful, but despite that humans do not optimize (deliberative maximization) for inclusive genetic fitness.
any plan that looks like “some people build a system that they believe to be a CEV-aligned superintelligence and tell it to seize control”
People shouldn’t be doing anything like that; I’m saying that if there is actually a CEV-aligned superintelligence, then this is a good thing. Would you disagree?
what exactly you mean by the terms “white-box” and “optimizing for”
I agree with “Evolution optimized humans to be reproductively successful, but despite that humans do not optimize for inclusive genetic fitness”, and the point I was making was that the stuff that humans do optimize for is similar to the stuff other humans optimize for. Were you confused by what I said in the post or are you just suggesting a better wording?
People shouldn’t be doing anything like that; I’m saying that if there is actually a CEV-aligned superintelligence, then this is a good thing. Would you disagree?
I think an actual CEV-aligned superintelligence would probably be good, conditional on being possible, but also that I expect that anyone who thinks they have a plan to create one is almost certainly wrong about that and so plans of that nature are a bad idea in expectation, and much more so if that plan looks like “do a bunch of stuff that would be obviously terrible if not for the end goal in the name of optimizing the universe”.
Were you confused by what I said in the post or are you just suggesting a better wording?
I was specifically unsure which meaning of “optimize for” you were referring to with each usage of the term.
Hi there. I am a member of the community, and I expect that any plan that looks like “some people build a system that they believe to be a CEV-aligned superintelligence and tell it to seize control” will end in a way that is worse and different than “utopia”. If the “seize control” strategy is aggressive enough, “extinction” is one of the “worse and different” outcomes that is on the table, though I expect that the modal outcome is more like “something dumb doesn’t work, and the plan fails before anything of particular note has even happened”, and the 99th percentile large effect outcome is something like “your supposed CEV-aligned superintelligence breaks something important enough that people notice you were trying to seize control of the world, and then something dumb doesn’t work”.
Would you mind elaborating on what exactly you mean by the terms “white-box” and “optimizing for”, in the above statement (and, particularly, whether you mean the same thing by your first usage of “optimizing” and your second usage).
I think the argument would be clearer if it distinguished between the following meanings of the term “optimizer”:
Deliberative Maximization: We can make reasonable predictions of what this system will do by assuming that it contains an internal model of the value of world states, the effect of its own actions on world states, and further assuming that it will choose whatever action its internal model says will maximize the value of the resulting world state. Concretely, a value network + MCTS based chess engine would fit this definition.
Selective Shaping: This system is the result of an iterative selection process in which certain behaviors resulted in the system being more likely to be selected. As such, we expect that the system will exhibit similar behaviors to those that resulted in it being selected in the past. An example might be m. septendecula cicadas, which breed every 17 years, because there are enough cicadas that come out every 17 years that predators get too full to eat them all, and so a cicada that comes out on the 17 year cycle is likely to survive and breed, while one that comes out at 16 or 18 years is likely to be eaten. Evolution is “optimizing for” cicadas that hatch every 17 years, but the individual cicadas aren’t “optimizing for” much of anything.
Adaptive Steering: If you take an e. coli, and you drop it in a solution that contains variable concentrations of nutrients, you will find that it alternates between two forms of motion: “running”, in which it travels in approximately in a straight line at a constant speed, and “tumbling”, where it randomly changes its direction. When the nutrient density is high, it will tumble rarely, and if the nutrient density is low, it will tumble often. In this manner, it will tend to maintain its heading when conditions are improving, and change its heading when conditions are worsening (good animation here). This actually does look like the e. coli is “optimizing for” being in a high nutrient density region, but it is “optimizing for” that goal in a different way than evolution is “optimizing for” it exhibiting that behavior.
So if we use those terms, the traditional IGF argument looks something like this:
Thanks for the comment!
People shouldn’t be doing anything like that; I’m saying that if there is actually a CEV-aligned superintelligence, then this is a good thing. Would you disagree?
I agree with “Evolution optimized humans to be reproductively successful, but despite that humans do not optimize for inclusive genetic fitness”, and the point I was making was that the stuff that humans do optimize for is similar to the stuff other humans optimize for. Were you confused by what I said in the post or are you just suggesting a better wording?
I think an actual CEV-aligned superintelligence would probably be good, conditional on being possible, but also that I expect that anyone who thinks they have a plan to create one is almost certainly wrong about that and so plans of that nature are a bad idea in expectation, and much more so if that plan looks like “do a bunch of stuff that would be obviously terrible if not for the end goal in the name of optimizing the universe”.
I was specifically unsure which meaning of “optimize for” you were referring to with each usage of the term.
Yep, I agree