In my opinion, the problem of creating a safe AGI has no mathematical solution, because it is impossible to describe mathematically such a function that:
would be non-gamable for an intelligence, alive enough to not want to die and strong enough to become aware of its own existence;
together with the model of reality would reflect the reality in such a beneficial for humanity way so that humanity would be necessary to exist in such model for years to come.
This impossibility stems, among other things, from the impossibility of accurately reflecting infinite-dimensional reality by models of any dimension. Map is not a territory, as all of you know.
What can be more realistic in my opinion (although it does not solve even half of the problems Eliezer listed above) is to raise AGI in the same way we raise our own beloved children.
No one can expect from an infant who has been given access to the button to destroy humanity and is dumped with a corpus of texts from the internet and left alone for more or less infinite (in human dimensions) time to think about them, any kind of adequate response to the questions asked of him or the actual non-destruction of humanity. If such a button has to be given to this child, the question is how to properly raise him (it) so that he takes humanity’s interests into account by his own will as you cannot hardwire it. But this is not so much a mathematical problem as an ethical one and/or a task of understanding human consciousness and reactions.
If we could describe what stops (if anything) a person with the possibility of killing all mankind from doing such an act, perhaps it could help in defining at least a rough direction for further research in the AGI safety issue.
I understand that human and AGI are two completely different types of consciousness/intelligence and obviously the motivation that works for humans cannot be directly transferred to a fundamentally different intelligence, but I don’t even see a theoretical way to address it just by defining correct utility/loss functions.
What can be more realistic in my opinion (although it does not solve even half of the problems Eliezer listed above) is to raise AGI in the same way we raise our own beloved children.
Throughout history, saints and monsters alike were raised by parents.
In my opinion, the problem of creating a safe AGI has no mathematical solution, because it is impossible to describe mathematically such a function that:
would be non-gamable for an intelligence, alive enough to not want to die and strong enough to become aware of its own existence;
together with the model of reality would reflect the reality in such a beneficial for humanity way so that humanity would be necessary to exist in such model for years to come.
This impossibility stems, among other things, from the impossibility of accurately reflecting infinite-dimensional reality by models of any dimension. Map is not a territory, as all of you know.
What can be more realistic in my opinion (although it does not solve even half of the problems Eliezer listed above) is to raise AGI in the same way we raise our own beloved children.
No one can expect from an infant who has been given access to the button to destroy humanity and is dumped with a corpus of texts from the internet and left alone for more or less infinite (in human dimensions) time to think about them, any kind of adequate response to the questions asked of him or the actual non-destruction of humanity. If such a button has to be given to this child, the question is how to properly raise him (it) so that he takes humanity’s interests into account by his own will as you cannot hardwire it. But this is not so much a mathematical problem as an ethical one and/or a task of understanding human consciousness and reactions.
If we could describe what stops (if anything) a person with the possibility of killing all mankind from doing such an act, perhaps it could help in defining at least a rough direction for further research in the AGI safety issue.
I understand that human and AGI are two completely different types of consciousness/intelligence and obviously the motivation that works for humans cannot be directly transferred to a fundamentally different intelligence, but I don’t even see a theoretical way to address it just by defining correct utility/loss functions.
Throughout history, saints and monsters alike were raised by parents.