Now that is the right question. There is the AGI Ruin list which talks about a lot of the hard problems.
I think a very core thing is figuring out how can we make a system robustly “want” something. There are actually a bunch more heuristics that you can use in order to determine good problems to work. One is to think about what things need to be solved because they will show up in virtually all agendas (or at least all agendas of a particular type). And how to make a system robustly “want” something probably falls into that category.
If we could just figure out this, we might be able to get away with not figuring out human values. Potentially we could make the AI perform some narrow task, that constitutes a pivotal act. However, figuring out just how to make a system robustly “want” something does not seem to be enough. We need to also figure out how to make the system “want” to perform the narrow thing that constitutes a pivotal act. And we also need to make it such that the system would not spawn misaligned subagents. And probably a bunch more problems that did not come immediately to mind.
I think a better question is whether we have the environment that can cultivate the necessary behavior leading us to solve the most challenging aspects of AI alignment.
What is the hardest part of AI alignment?
Now that is the right question. There is the AGI Ruin list which talks about a lot of the hard problems.
I think a very core thing is figuring out how can we make a system robustly “want” something. There are actually a bunch more heuristics that you can use in order to determine good problems to work. One is to think about what things need to be solved because they will show up in virtually all agendas (or at least all agendas of a particular type). And how to make a system robustly “want” something probably falls into that category.
If we could just figure out this, we might be able to get away with not figuring out human values. Potentially we could make the AI perform some narrow task, that constitutes a pivotal act. However, figuring out just how to make a system robustly “want” something does not seem to be enough. We need to also figure out how to make the system “want” to perform the narrow thing that constitutes a pivotal act. And we also need to make it such that the system would not spawn misaligned subagents. And probably a bunch more problems that did not come immediately to mind.
I think a better question is whether we have the environment that can cultivate the necessary behavior leading us to solve the most challenging aspects of AI alignment.