Question for my fellow alignment researchers out there, do you have a list of unsolved problems in AI alignment? I’m thinking of creating an “alignment mosaic” of the questions we need to resolve and slowly filling it in with insights from papers/posts.
I have my own version of this, but I would love to combine it with others’ alignment backcasting game-trees. I want to collect the kinds of questions people are keeping in mind when reading papers/posts, thinking about alignment or running experiments. I’m working with others to make this into a collaborative effort.
Ultimately, what I’m looking for are important questions and sub-questions we need to be thinking about and updating on when we read papers and posts as well as when we decide what to read.
[Question] What‘s in your list of unsolved problems in AI alignment?
Question for my fellow alignment researchers out there, do you have a list of unsolved problems in AI alignment? I’m thinking of creating an “alignment mosaic” of the questions we need to resolve and slowly filling it in with insights from papers/posts.
I have my own version of this, but I would love to combine it with others’ alignment backcasting game-trees. I want to collect the kinds of questions people are keeping in mind when reading papers/posts, thinking about alignment or running experiments. I’m working with others to make this into a collaborative effort.
Ultimately, what I’m looking for are important questions and sub-questions we need to be thinking about and updating on when we read papers and posts as well as when we decide what to read.
Here’s my Twitter thread posing this question: https://twitter.com/jacquesthibs/status/1633146464640663552?s=46&t=YyfxSdhuFYbTafD4D1cE9A.
Here’s a sub-thread breaking down the alignment problem in various forms: https://twitter.com/jacquesthibs/status/1633165299770880001?s=46&t=YyfxSdhuFYbTafD4D1cE9A.