I’m going to say that while strategy 1 isn’t going to solve all the problems, it might also solve benevolence, primarily because I think the AI Alignment problem is far far more general than say, the alignment problem of states aligned to their citizens. It’s much more like the problem of humans aligning to animals, and here the evidence is mostly depressing here, with the exception of pets, more or less.
Compared to states being aligned to citizens, where we actually have mechanisms that work imperfectly, in human-to-animal alignment, there aren’t mechanisms that work at all, short of pets.
I think several factors contribute to the problem:
A much more capable party can ignore restraints like laws or contracts, for the most part, and thus depends on their own goals, which are usually misaligned.
We depend on the fact that there aren’t that much differences in behavior, intelligence, and so on, and thus if you break it, things get bad fast. This is also known as the IID distribution on capabilities assumption.
Thus, success on strategy 1, especially if it can be extended to arbitrarily large inequalities in capabilities like intelligence, can essentially solve many of the special cases of alignment problems like states aligned to citizens.
I see this point about how making it easier to build safer AI can help to solve the benevolence problem by making the benevolent agents more competitive and this lowering the effective alignment tax. This is a good point.
But I would note that this only applies to the extent that one’s approach to strategy 1 means focusing on helping people working on safer AI do it more effectively. This does not include AI alignment goals. Ultimately, if a terrorist has a powerful AI system that is well-aligned with their goals, that’s very bad.
I’m going to say that while strategy 1 isn’t going to solve all the problems, it might also solve benevolence, primarily because I think the AI Alignment problem is far far more general than say, the alignment problem of states aligned to their citizens. It’s much more like the problem of humans aligning to animals, and here the evidence is mostly depressing here, with the exception of pets, more or less.
Compared to states being aligned to citizens, where we actually have mechanisms that work imperfectly, in human-to-animal alignment, there aren’t mechanisms that work at all, short of pets.
I think several factors contribute to the problem:
A much more capable party can ignore restraints like laws or contracts, for the most part, and thus depends on their own goals, which are usually misaligned.
We depend on the fact that there aren’t that much differences in behavior, intelligence, and so on, and thus if you break it, things get bad fast. This is also known as the IID distribution on capabilities assumption.
Thus, success on strategy 1, especially if it can be extended to arbitrarily large inequalities in capabilities like intelligence, can essentially solve many of the special cases of alignment problems like states aligned to citizens.
I see this point about how making it easier to build safer AI can help to solve the benevolence problem by making the benevolent agents more competitive and this lowering the effective alignment tax. This is a good point.
But I would note that this only applies to the extent that one’s approach to strategy 1 means focusing on helping people working on safer AI do it more effectively. This does not include AI alignment goals. Ultimately, if a terrorist has a powerful AI system that is well-aligned with their goals, that’s very bad.