abstractapplic comments on What are the most interesting /​ challenging evals (for humans) available?