Here’s my intuition: Eliezer and other friendly humans have got their values partially through evolution and selection. Genetic algorithms tend to be very robust—even robust to the problem not being properly specified. So I’d assume that Eliezer and evolved FAIs would preserve their friendliness if the laws of physics were changed.
An AI with a designed utility function is very different, however. These are very vulnerable to ontology crises, as they’re grounded in formal descriptions—and if the premises of the description change, their whole values change.
Now, presumably we can do better than that, and design a FAI to be robust across ontology changes—maybe mix in some evolution, or maybe some cunning mathematics. If this is possible, however, I would expect the same approach to succeed with a reduced impact AI.
I got 99 psychological drives but inclusive fitness ain’t one.
In what way is evolution supposed to be robust? It’s slow, stupid, doesn’t reproduce the content of goal systems at all and breaks as soon as you introduce it to a context sufficiently different from the environment of evolutionary ancestry because it uses no abstract reasoning in its consequentialism. It is the opposite of robust along just about every desirable dimension.
It’s not as brittle as methods like first order logic or computer programming. If I had really bad computer hardware (corrupted disks and all that), then an evolved algorithm is going to work a lot better than a lean formal program.
Similarly, if an AI was built by people who didn’t understand the concept of friendliness, I’d much prefer they used reinforcement learning or evolutionary algorithms than direct programming. With the first approaches, there is some chance the AI may infer the correct values. But with the wrong direct programming, there’s no chance of it being safe.
As you said, you’re altruistic, even if the laws of physics change—and yet you don’t have a full theory of humankind, of worth, of altruism, etc… So the mess in your genes, culture and brain has come up with something robust to ontology changes, without having to be explicit about it all. Even though evolution is not achieving it’s “goal” through you, something messy is working.
I think there might be a miscommunication going on here.
I see Stuart arguing that Genetic algorithms function independent of physics in terms of their consistent “friendly” trait. i.e. if in universe a there is a genetic algorithm that finds value in expressing the “friendly” trait, then that algorithm would, if placed in universe b (where the boundary conditions of the universe were slightly different) would tend to eventually express that “friendly” trait again. Thereby meaning robust (when compared to systems that could not do this)
I don’t necessarily agree with that argument, and my interpretation could be wrong.
I see Eliezer arguing that evolution as a system doesn’t do a heck of a lot, when compared to a system that is designed around a goal and involves compensation for failure. i.e. I can’t reproduce with a horse, this is a bad thing because if I were trapped on an island with a horse our genetic information would die off, where in a robust system, I could breed with a horse, thereby preserving our genetic information.
I’m sorry if this touches too closely on the entire “well, the dictionary says” argument.
If I had to guess Stuart_Armstrong’s meaning, I would guess that genetic algorithms are robust in that they can find a solution to a poorly specified and poorly understood problem statement. They’re not robust to dramatic changes in the environment (though they can correct for sufficiently slow, gradual changes very well); but their consequentialist nature provides some layer of protection from ontology crises.
Here’s my intuition: Eliezer and other friendly humans have got their values partially through evolution and selection. Genetic algorithms tend to be very robust—even robust to the problem not being properly specified. So I’d assume that Eliezer and evolved FAIs would preserve their friendliness if the laws of physics were changed.
An AI with a designed utility function is very different, however. These are very vulnerable to ontology crises, as they’re grounded in formal descriptions—and if the premises of the description change, their whole values change.
Now, presumably we can do better than that, and design a FAI to be robust across ontology changes—maybe mix in some evolution, or maybe some cunning mathematics. If this is possible, however, I would expect the same approach to succeed with a reduced impact AI.
I got 99 psychological drives but inclusive fitness ain’t one.
In what way is evolution supposed to be robust? It’s slow, stupid, doesn’t reproduce the content of goal systems at all and breaks as soon as you introduce it to a context sufficiently different from the environment of evolutionary ancestry because it uses no abstract reasoning in its consequentialism. It is the opposite of robust along just about every desirable dimension.
It’s not as brittle as methods like first order logic or computer programming. If I had really bad computer hardware (corrupted disks and all that), then an evolved algorithm is going to work a lot better than a lean formal program.
Similarly, if an AI was built by people who didn’t understand the concept of friendliness, I’d much prefer they used reinforcement learning or evolutionary algorithms than direct programming. With the first approaches, there is some chance the AI may infer the correct values. But with the wrong direct programming, there’s no chance of it being safe.
As you said, you’re altruistic, even if the laws of physics change—and yet you don’t have a full theory of humankind, of worth, of altruism, etc… So the mess in your genes, culture and brain has come up with something robust to ontology changes, without having to be explicit about it all. Even though evolution is not achieving it’s “goal” through you, something messy is working.
I think there might be a miscommunication going on here.
I see Stuart arguing that Genetic algorithms function independent of physics in terms of their consistent “friendly” trait. i.e. if in universe a there is a genetic algorithm that finds value in expressing the “friendly” trait, then that algorithm would, if placed in universe b (where the boundary conditions of the universe were slightly different) would tend to eventually express that “friendly” trait again. Thereby meaning robust (when compared to systems that could not do this)
I don’t necessarily agree with that argument, and my interpretation could be wrong.
I see Eliezer arguing that evolution as a system doesn’t do a heck of a lot, when compared to a system that is designed around a goal and involves compensation for failure. i.e. I can’t reproduce with a horse, this is a bad thing because if I were trapped on an island with a horse our genetic information would die off, where in a robust system, I could breed with a horse, thereby preserving our genetic information.
I’m sorry if this touches too closely on the entire “well, the dictionary says” argument.
Oh, now I feel silly. The horse IS the other universe.
If I had to guess Stuart_Armstrong’s meaning, I would guess that genetic algorithms are robust in that they can find a solution to a poorly specified and poorly understood problem statement. They’re not robust to dramatic changes in the environment (though they can correct for sufficiently slow, gradual changes very well); but their consequentialist nature provides some layer of protection from ontology crises.
You know this is blank, right?
I had a response that was mainly a minor nitpick; it didn’t add anything, so I removed it.
Just checking.