I personally would have liked to see some mention of the classic ‘outer’ alignment questions that are subproblems of robustness and ELK. E.g. What counts as ‘generalizing correctly’? → How do you learn how humans want the AI to generalize? → How do you model humans as systems that have preferences about how to model them?
Thanks for the detailed post!
I personally would have liked to see some mention of the classic ‘outer’ alignment questions that are subproblems of robustness and ELK. E.g. What counts as ‘generalizing correctly’? → How do you learn how humans want the AI to generalize? → How do you model humans as systems that have preferences about how to model them?