(I wrote this little explainer when drafting Response to Blake Richards: AGI, generality, alignment, & loss functions a few months ago, but it got cut from the final version. I figure it kinda stands alone, and maybe people will find it to be helpful pedagogy, because I recall being confused about this topic for quite a while when I first came across it a few years ago. Also, there’s some chance that I’m still confused about NFL, and if so, someone can correct me in the comments! Anyway, enjoy!)
What is the No Free Lunch (NFL) theorem?
Suppose there’s a function F that maps each of a finite number of inputs {i1,…,iN} to one of a finite number of outputs {o1,…,oM}. You have no idea what this function is; literally any possible function, out of the MN possibilities, is equally likely, as far as you know.
In fact, I generated the function F by secretly rolling an M-sided die N times. If I got a “9” on my first die-roll, then F(i1)=o9, and so on.
Now, you and me have a conversation about this unknown-to-you function F:
Me: “Pop quiz: What is F(i3)?”
You: “How should I know? You just told me that any of the MN possible F’s are equally likely. So F(i3) is equally likely to be any of the M possible outputs {o1,…,oM}.”
Me: “OK, I’ll give you a hint: F(i1)=o9. Again, what is F(i3)?”
You: “WTF kind of hint is that?? That doesn’t help me at all! I still have no idea. It’s still equally likely to be anything!!”
Me: “OK, here’s another hint: F(i2)=o8. Again, what is F(i3)?”
You: “Stop being annoying. That’s not a real hint. You still haven’t given me any useful information. Based on what you’ve told me, there are MN−2 possible F’s, and they’re all equally likely, and they’re divided evenly amongst the M possible values of F(i3).”
That’s the essence of the No Free Lunch Theorem (or maybe family of theorems?), as far as I understand it. It sounds dumb, and honestly it is kind of dumb. But you can describe it in a way that makes it sound very profound:
There’s no such thing as “learning” or “figuring things out” in general; any algorithm that successfully does so on some problems will do catastrophically badly on other problems.”
In other words, if you’ve observed F(i1),F(i2),…,F(i1000), that information doesn’t help you at all in figuring out what F(i1001) is going to be, when we’re in this situation where F is a priori equally likely to be any of the MN possibilities. If you throw some pattern-learning algorithm at this problem, it might form guesses about what F(i1001) is going to be, but those guesses are going to be garbage.
Here’s an example: there are bit-strings for which Solomonoff Induction performs at worse-than-chance level forever! After all, there are bit-strings for which it performs at better-than-chance level forever, and if you average over every possible bit-string, its guesses have to be wrong as often as they’re right.
Sidenote: Why NFL has basically nothing to do with AGI
In short, things like “creating and executing effective plans in the real world” and “inventing new technology” and so on are all very different from “characterize an unknown function that was secretly generated by repeatedly flipping a coin / rolling a die”. NFL applies to the latter, not the former. As an example:
…Yet what an AI that would be! Imagine one AI that prints out beautiful original scientific papers literally every few minutes, night and day, encompassing brilliant new ideas in every field of knowledge, among many other endeavors. And if everyone in that global R&D system were able to think 1000× faster, that would still be compatible with NFL. And if any person in that system could also instantly clone themselves at will (including all their adult knowledge), that would still be compatible with NFL! And if you also magically grant everyone in this global R&D system the ability to summon arbitrarily many copies of any other human, dead or alive, to help them with their ongoing research projects (maybe they summon a few Donald Knuths for their algorithms-related challenges, and some Marc Raiberts for their robotics challenges, and an army of John Von Neumanns for whatever, etc.), that would still be compatible with NFL! Nobody would hesitate to call such a AI superintelligent. And there’s no reason to think that even this is anywhere near the ceiling of what is allowed by NFL.
The No Free Lunch theorem for dummies
(I wrote this little explainer when drafting Response to Blake Richards: AGI, generality, alignment, & loss functions a few months ago, but it got cut from the final version. I figure it kinda stands alone, and maybe people will find it to be helpful pedagogy, because I recall being confused about this topic for quite a while when I first came across it a few years ago. Also, there’s some chance that I’m still confused about NFL, and if so, someone can correct me in the comments! Anyway, enjoy!)
What is the No Free Lunch (NFL) theorem?
Suppose there’s a function F that maps each of a finite number of inputs {i1,…,iN} to one of a finite number of outputs {o1,…,oM}. You have no idea what this function is; literally any possible function, out of the MN possibilities, is equally likely, as far as you know.
In fact, I generated the function F by secretly rolling an M-sided die N times. If I got a “9” on my first die-roll, then F(i1)=o9, and so on.
Now, you and me have a conversation about this unknown-to-you function F:
Me: “Pop quiz: What is F(i3)?”
You: “How should I know? You just told me that any of the MN possible F’s are equally likely. So F(i3) is equally likely to be any of the M possible outputs {o1,…,oM}.”
Me: “OK, I’ll give you a hint: F(i1)=o9. Again, what is F(i3)?”
You: “WTF kind of hint is that?? That doesn’t help me at all! I still have no idea. It’s still equally likely to be anything!!”
Me: “OK, here’s another hint: F(i2)=o8. Again, what is F(i3)?”
You: “Stop being annoying. That’s not a real hint. You still haven’t given me any useful information. Based on what you’ve told me, there are MN−2 possible F’s, and they’re all equally likely, and they’re divided evenly amongst the M possible values of F(i3).”
That’s the essence of the No Free Lunch Theorem (or maybe family of theorems?), as far as I understand it. It sounds dumb, and honestly it is kind of dumb. But you can describe it in a way that makes it sound very profound:
In other words, if you’ve observed F(i1),F(i2),…,F(i1000), that information doesn’t help you at all in figuring out what F(i1001) is going to be, when we’re in this situation where F is a priori equally likely to be any of the MN possibilities. If you throw some pattern-learning algorithm at this problem, it might form guesses about what F(i1001) is going to be, but those guesses are going to be garbage.
Here’s an example: there are bit-strings for which Solomonoff Induction performs at worse-than-chance level forever! After all, there are bit-strings for which it performs at better-than-chance level forever, and if you average over every possible bit-string, its guesses have to be wrong as often as they’re right.
Sidenote: Why NFL has basically nothing to do with AGI
See discussion by Eliezer Yudkowsky at Arbital: No-Free-Lunch theorems are often irrelevant and A reply to Francois Chollet on intelligence explosion.
In short, things like “creating and executing effective plans in the real world” and “inventing new technology” and so on are all very different from “characterize an unknown function that was secretly generated by repeatedly flipping a coin / rolling a die”. NFL applies to the latter, not the former. As an example:
…Yet what an AI that would be! Imagine one AI that prints out beautiful original scientific papers literally every few minutes, night and day, encompassing brilliant new ideas in every field of knowledge, among many other endeavors. And if everyone in that global R&D system were able to think 1000× faster, that would still be compatible with NFL. And if any person in that system could also instantly clone themselves at will (including all their adult knowledge), that would still be compatible with NFL! And if you also magically grant everyone in this global R&D system the ability to summon arbitrarily many copies of any other human, dead or alive, to help them with their ongoing research projects (maybe they summon a few Donald Knuths for their algorithms-related challenges, and some Marc Raiberts for their robotics challenges, and an army of John Von Neumanns for whatever, etc.), that would still be compatible with NFL! Nobody would hesitate to call such a AI superintelligent. And there’s no reason to think that even this is anywhere near the ceiling of what is allowed by NFL.