I asked claude opus whether it could clearly parse different tic-tac-toe notations and it just said ‘yes I can’ to all of them, despite having pretty poor performance in most.
yeah, its introspection is definitely less than perfect. I’ll DM the prompt I’ve been using so you can see its scores.
I asked claude opus whether it could clearly parse different tic-tac-toe notations and it just said ‘yes I can’ to all of them, despite having pretty poor performance in most.
yeah, its introspection is definitely less than perfect. I’ll DM the prompt I’ve been using so you can see its scores.