I want to say it, but my view is that evolution’s goals were very easy to reach, and in partucular, it can make use of the following assumptions:
Deceptive Alignment does not matter, in other words, so long as it reproduces, deception doesn’t matter. For most goals, deceptive alignment would entirely break the alignment, since we’re usually aiming for much more specific goals.
Instrumental goals can be used at least in part to do the task, that is instrumental convergence isn’t the threat it’s usually portrayed as.
There are other reasons, but these two are the main reasons why the alignment problem is so hard (without interpretability tools.)
How are we possibly aiming for “much more specific goals”—remember evolution intra-aligned brains to each other through altruism. We only need to improve on that.
And regardless we could completely ignore human values and just create AI that optimizes for human empowerment (maximization of our future optionality, or future potential to fulfill any goal).
I want to say it, but my view is that evolution’s goals were very easy to reach, and in partucular, it can make use of the following assumptions:
Deceptive Alignment does not matter, in other words, so long as it reproduces, deception doesn’t matter. For most goals, deceptive alignment would entirely break the alignment, since we’re usually aiming for much more specific goals.
Instrumental goals can be used at least in part to do the task, that is instrumental convergence isn’t the threat it’s usually portrayed as.
There are other reasons, but these two are the main reasons why the alignment problem is so hard (without interpretability tools.)
How are we possibly aiming for “much more specific goals”—remember evolution intra-aligned brains to each other through altruism. We only need to improve on that.
And regardless we could completely ignore human values and just create AI that optimizes for human empowerment (maximization of our future optionality, or future potential to fulfill any goal).