>So, my guess at Leo’s reaction is one of RLHF-optimism.
This is more or less what he seems to say according to the transcript—he thinks we will have legible trustworthy chain of thought at least for the initial automated AI researchers, we can RLHF them, and then use them to do alignment research. This of course is not a new concept and has been debated here ad nauseum but it’s not a shocking view for a member of Ilya and Jan’s team and he clearly cosigns it in the interview.
>So, my guess at Leo’s reaction is one of RLHF-optimism.
This is more or less what he seems to say according to the transcript—he thinks we will have legible trustworthy chain of thought at least for the initial automated AI researchers, we can RLHF them, and then use them to do alignment research. This of course is not a new concept and has been debated here ad nauseum but it’s not a shocking view for a member of Ilya and Jan’s team and he clearly cosigns it in the interview.