Towards_Keeperhood answers What are the most interesting /​ challenging evals (for humans) available?