Possibly one of the most impactful AI control papers of 2023. It went far beyond LessWrong, making into a separate 30-minute video dedicated to (positively) reviewing the proposed solution.
The paper also enjoyed some academic success. As of January 3, 2025, it not only has 23 citations on LessWrong, but also 24 citations on Google Scholar.
This paper strongly updated me towards thinking that AI control is possible, feasible and should be actively implemented to prevent catastrophic outcomes.
Possibly one of the most impactful AI control papers of 2023. It went far beyond LessWrong, making into a separate 30-minute video dedicated to (positively) reviewing the proposed solution.
The paper also enjoyed some academic success. As of January 3, 2025, it not only has 23 citations on LessWrong, but also 24 citations on Google Scholar.
This paper strongly updated me towards thinking that AI control is possible, feasible and should be actively implemented to prevent catastrophic outcomes.