It’s not as brittle as methods like first order logic or computer programming. If I had really bad computer hardware (corrupted disks and all that), then an evolved algorithm is going to work a lot better than a lean formal program.
Similarly, if an AI was built by people who didn’t understand the concept of friendliness, I’d much prefer they used reinforcement learning or evolutionary algorithms than direct programming. With the first approaches, there is some chance the AI may infer the correct values. But with the wrong direct programming, there’s no chance of it being safe.
As you said, you’re altruistic, even if the laws of physics change—and yet you don’t have a full theory of humankind, of worth, of altruism, etc… So the mess in your genes, culture and brain has come up with something robust to ontology changes, without having to be explicit about it all. Even though evolution is not achieving it’s “goal” through you, something messy is working.
It’s not as brittle as methods like first order logic or computer programming. If I had really bad computer hardware (corrupted disks and all that), then an evolved algorithm is going to work a lot better than a lean formal program.
Similarly, if an AI was built by people who didn’t understand the concept of friendliness, I’d much prefer they used reinforcement learning or evolutionary algorithms than direct programming. With the first approaches, there is some chance the AI may infer the correct values. But with the wrong direct programming, there’s no chance of it being safe.
As you said, you’re altruistic, even if the laws of physics change—and yet you don’t have a full theory of humankind, of worth, of altruism, etc… So the mess in your genes, culture and brain has come up with something robust to ontology changes, without having to be explicit about it all. Even though evolution is not achieving it’s “goal” through you, something messy is working.