This seems like really valuable work! And while situational awareness isn’t a sufficient condition for being able to fully automate many intellectual tasks, it seems like a necessary condition at least so this is already a much superior benchmark for ‘intelligence’ than e.g. MMLU.
This seems like really valuable work! And while situational awareness isn’t a sufficient condition for being able to fully automate many intellectual tasks, it seems like a necessary condition at least so this is already a much superior benchmark for ‘intelligence’ than e.g. MMLU.