Kudos for tracking the predictions, and for making the benchmark! I’d be really excited to see more benchmarks that current AI does really badly on being created. Seems like a good way to understand capabilities going forward.
Kudos for tracking the predictions, and for making the benchmark! I’d be really excited to see more benchmarks that current AI does really badly on being created. Seems like a good way to understand capabilities going forward.