It’s a problem of sequence. The superintelligence will be able to solve Semantics-in-General, but at that point if it isn’t already safe it will be rather late to start working on safety. Tasking the programmers to work on Semantics-in-General makes things harder if it’s a more complex or roundabout way of trying to address Indirect Normativity; most of the work on understanding what English-language sentences mean can be relegated to the SI, provided we’ve already made it safe to make an SI at all.
It’s worth noting that using an AI’s semantic understanding of ethics to modify it’s motivational system is so unghostly, and unmysterious that it’s actually been done:
But that doesn’t prove much, because it was never—not in 2023, not in 2013 -- the case that that kind of self-correction was necessarily an appeal to the supernatural. Using one part of a software system to modify another is not magic!
The superintelligence will be able to solve Semantics-in-General, but at that point if it isn’t already safe it will be rather late to start working on safety.
We have AIs with very good semantic understanding that haven’t killed us, and we are working on safety.
It’s a problem of sequence. The superintelligence will be able to solve Semantics-in-General, but at that point if it isn’t already safe it will be rather late to start working on safety. Tasking the programmers to work on Semantics-in-General makes things harder if it’s a more complex or roundabout way of trying to address Indirect Normativity; most of the work on understanding what English-language sentences mean can be relegated to the SI, provided we’ve already made it safe to make an SI at all.
It’s worth noting that using an AI’s semantic understanding of ethics to modify it’s motivational system is so unghostly, and unmysterious that it’s actually been done:
https://astralcodexten.substack.com/p/constitutional-ai-rlhf-on-steroids
But that doesn’t prove much, because it was never—not in 2023, not in 2013 -- the case that that kind of self-correction was necessarily an appeal to the supernatural. Using one part of a software system to modify another is not magic!
We have AIs with very good semantic understanding that haven’t killed us, and we are working on safety.
Then solve semantics in a seed.