Beating benchmarks, even very difficult ones, is all find and dandy, but we must remember that those tests, no matter how difficult, are at best only a limited measure of human ability. Why? Because they present the test-take with a well-defined situation to which they must respond. Life isn’t like that. It’s messy and murky. Perhaps the most difficult step is to wade into the mess and the murk and impose a structure on it – perhaps by simply asking a question – so that one can then set about dealing with that situation in terms of the imposed structure. Tests give you a structured situation. That’s not what the world does.
Scientific reasoning consists of essentially three steps: coming up with hypotheses, conducting experiments, and using the results to update one’s hypotheses. Science is the ultimate open-ended problem, in that we always have an infinite space of possible hypotheses to choose from, and an infinite space of possible observations. For hypothesis generation: How do we navigate this space effectively? How do we generate diverse, relevant, and explanatory hypotheses? It is one thing to have ChatGPT generate incremental ideas. It is another thing to come up with truly novel, paradigm-shifting concepts.
Right.
How do we put o3, or any other AI, out in the world where it can roam around, poke into things, and come up with its own problems to solve? If you want AGI in any deep and robust sense, that’s what you have to do. That calls for real agency. I don’t see that OpenAI or any other organization is anywhere close to figuring out how to do this.
Beating benchmarks, even very difficult ones, is all find and dandy, but we must remember that those tests, no matter how difficult, are at best only a limited measure of human ability. Why? Because they present the test-take with a well-defined situation to which they must respond. Life isn’t like that. It’s messy and murky. Perhaps the most difficult step is to wade into the mess and the murk and impose a structure on it – perhaps by simply asking a question – so that one can then set about dealing with that situation in terms of the imposed structure. Tests give you a structured situation. That’s not what the world does.
Consider this passage from Sam Rodiques, “What does it take to build an AI Scientist”
Right.
How do we put o3, or any other AI, out in the world where it can roam around, poke into things, and come up with its own problems to solve? If you want AGI in any deep and robust sense, that’s what you have to do. That calls for real agency. I don’t see that OpenAI or any other organization is anywhere close to figuring out how to do this.