A system can be aligned in most of these senses without being beneficial. Being beneficial is distinct from being aligned in senses 1–4 because those deal only with the desires of a particular human principal, which may or may not be beneficial. Being beneficial is distinct from conception 5 because beneficial AI aims to benefit many or all moral patients. Only AI that is aligned in the sixth sense would be beneficial by definition. Conversely, AI need not be well-aligned to be beneficial (though it might help).
At some point, some particular group of humans code the AI and press run. If all the people who coded it were totally evil, they will make an AI that does evil things.
The only place any kind of morality can affect the AI’s decisions is if the programmers are somewhat moral. Whether the programmers hard-code their morality, or meta-morality or code the AI to do what they ask, then ask for moral things is an implementation detail. The key causal link is from the programmers preferences (including very abstract meta-preferences for fairness ect) to the AI’s actions.
At some point, some particular group of humans code the AI and press run. If all the people who coded it were totally evil, they will make an AI that does evil things.
The only place any kind of morality can affect the AI’s decisions is if the programmers are somewhat moral.
(Note that I think any disagreement we may have here dissolves upon the clarification that I also—or maybe primarily for the purposes of this series—care about non-AGI but very profitable AI systems)
At some point, some particular group of humans code the AI and press run. If all the people who coded it were totally evil, they will make an AI that does evil things.
The only place any kind of morality can affect the AI’s decisions is if the programmers are somewhat moral. Whether the programmers hard-code their morality, or meta-morality or code the AI to do what they ask, then ask for moral things is an implementation detail. The key causal link is from the programmers preferences (including very abstract meta-preferences for fairness ect) to the AI’s actions.
(Note that I think any disagreement we may have here dissolves upon the clarification that I also—or maybe primarily for the purposes of this series—care about non-AGI but very profitable AI systems)