O O comments on O O’s Shortform

O O 17 Nov 2024 17:57 UTC
1 point
0
I guess in the real world the rules aren’t harder per se but just less clear and not written down. I think both the rules and tools needed to solve contest math questions at least feel harder than the vast majority of rules and tools human minds deal with. Someone like Terrence Tao, who is a master of these, excelled in every subject when he was a kid (iirc).

I think LLMs have a pretty good model of human behavior, so for anything related to human judgement, in theory this isn’t why it’s not doing well.

And where rules are unwritten/unknown (say biology), are the rules not at least captured by current methods? The next steps are probably like baking the intuitions of something like alphafold into something like o1. Whatever that means. R&D is what’s important and there is generally vast sums of data there.
- Vladimir_Nesov 17 Nov 2024 19:39 UTC
  2 points
  0
  Parent
  
  for anything related to human judgement, in theory this isn’t why it’s not doing well
  
  The facts are in there, but not in the form of a sufficiently good reward model that can tell as well as human experts which answer is better or whether a step of an argument is valid. In the same way, RLHF is still better with humans on some queries, hasn’t been fully automated to superior results by replacing humans with models in all cases.