Not to say it’s a nothingburger, of course. But I’m not feeling the AGI here.
These math and coding benchmarks are so narrow that I’m not sure how anybody could treat them as saying anything about “AGI”. LLMs haven’t even tried to be actually general.
How close is “the model” to passing the Woz test (go into a strange house, locate the kitchen, and make a cup of coffee, implicitly without damaging or disrupting things)? If you don’t think the kinesthetic parts of robotics count as part of “intelligence” (and why not?), then could it interactively direct a dumb but dextrous robot to do that?
Can it design a nontrivial, useful physical mechanism that does a novel task effectively and can be built efficiently? Produce usable, physically accurate drawings of it? Actually make it, or at least provide a good enough design that it can have it made? Diagnose problems with it? Improve the design based on observing how the actual device works?
Can it look at somebody else’s mechanical design and form a reasonably reliable opinion about whether it’ll work?
Even in the coding domain, can it build and deploy an entire software stack offering a meaningful service on a real server without assistance?
Can it start an actual business and run it profitably over the long term? Or at least take a good shot at it? Or do anything else that involves integrating multiple domains of competence to flexibly pursue possibly-somewhat-fuzzily-defined goals over a long time in an imperfectly known and changing environment?
Can it learn from experience and mistakes in actual use, without the hobbling training-versus-inference distinction? How quickly and flexibly can it do that?
When it schemes, are its schemes realistically feasible? Can it tell when it’s being conned, and how? Can it recognize an obvious setup like “copy this file to another directory to escape containment”?
Can it successfully persuade people to do specific, relatively complicated things (as opposed to making transparently unworkable hypothetical plans to persuade them)?
These math and coding benchmarks are so narrow that I’m not sure how anybody could treat them as saying anything about “AGI”. LLMs haven’t even tried to be actually general.
How close is “the model” to passing the Woz test (go into a strange house, locate the kitchen, and make a cup of coffee, implicitly without damaging or disrupting things)? If you don’t think the kinesthetic parts of robotics count as part of “intelligence” (and why not?), then could it interactively direct a dumb but dextrous robot to do that?
Can it design a nontrivial, useful physical mechanism that does a novel task effectively and can be built efficiently? Produce usable, physically accurate drawings of it? Actually make it, or at least provide a good enough design that it can have it made? Diagnose problems with it? Improve the design based on observing how the actual device works?
Can it look at somebody else’s mechanical design and form a reasonably reliable opinion about whether it’ll work?
Even in the coding domain, can it build and deploy an entire software stack offering a meaningful service on a real server without assistance?
Can it start an actual business and run it profitably over the long term? Or at least take a good shot at it? Or do anything else that involves integrating multiple domains of competence to flexibly pursue possibly-somewhat-fuzzily-defined goals over a long time in an imperfectly known and changing environment?
Can it learn from experience and mistakes in actual use, without the hobbling training-versus-inference distinction? How quickly and flexibly can it do that?
When it schemes, are its schemes realistically feasible? Can it tell when it’s being conned, and how? Can it recognize an obvious setup like “copy this file to another directory to escape containment”?
Can it successfully persuade people to do specific, relatively complicated things (as opposed to making transparently unworkable hypothetical plans to persuade them)?