I’m working on these lines to create an easy to understand numeric evaluation scale for AGIs. The dream would be something like: “Gato is AGI level 3.5, while the average human is 8.7.” I believe the scale should factor in that no single static test can be a reliable test of intelligence (any test can be gamed and overfitted).
A good reference on the subject is “The Measure of All Minds” by Orallo.
Happy to share a draft, send me a DM if interested.
I’m working on these lines to create an easy to understand numeric evaluation scale for AGIs. The dream would be something like: “Gato is AGI level 3.5, while the average human is 8.7.” I believe the scale should factor in that no single static test can be a reliable test of intelligence (any test can be gamed and overfitted).
A good reference on the subject is “The Measure of All Minds” by Orallo.
Happy to share a draft, send me a DM if interested.
I think that building compound metrics here is just another way to provide something for people to Goodhart—but I’ve written much more about the pros and cons of different approaches elsewhere, so I won’t repeat myself here.
Thanks for the link, I will check it out.