In order to do alignment research, we need to understand how AGI works; and we currently don’t understand how AGI works, so we need to have more capabilities research so that we would have a chance of figuring it out.
I totally agree with this. Alas, “understand how AGI works” is not something which most capabilities work even attempts to do.
It turns out that people can advance capabilities without having much clue what’s going on inside their magic black boxes, and that’s what most capabilities work looks like at this point.
Agreed, but the black-box experimentation seems like it’s plausibly a prerequisite for actual understanding? E.g. you couldn’t analyze InceptionV1 or CLIP to understand its inner workings before you actually had those models. To use your car engine metaphor from the other comment, we can’t open the engine and stick it full of sensors before we actually have the engine. And now that we do have engines, people are starting to stick them full of sensors, even if most of the work is still focused on building even fancier engines.
It seems reasonable to expect that as long as there are low-hanging fruit to be picked using black boxes, we get a lot of black boxes and the occasional paper dedicated to understanding what’s going on with them and how they work. Then when it starts getting harder to get novel interesting results with just black box tinkering, the focus will shift to greater theoretical understanding and more thoroughly understanding everything that we’ve accomplished so far.
I think we are getting some information. For example, we can see that token level attention is actually quite powerful for understanding language and also images. We have some understanding of scaling laws. I think the next step is a deeper understanding of how world modeling fits in with action generation—how much can you get with just world modeling, versus world modeling plus reward/action combined?
If the transformer architecture is enough to get us there, it tells us a sort of null hypothesis for intelligence—that the structure for predicting sequences by comparing all pairs of elements of a limited sequence—is general.
Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
I think teaching a transformer with an internal thought process (predicting the next tokens over a part of the sequence that’s “showing your work”) would be an interesting insight into how intelligence might work. I thought of this a little while back but also discovered this is also a long standing MIRI research direction into transparency. I wouldn’t be surprised if Google took it up at this point.
Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
Suppose I’m designing an engine. I try out a new design, and it surprises me—it works much worse or much better than expected. That’s a few bits of information. That’s basically the sort of information we get from AI experiments today.
What we’d really like is to open up that surprising engine, stick thermometers all over the place, stick pressure sensors all over the place, measure friction between the parts, measure vibration, measure fluid flow and concentrations and mixing, measure heat conduction, etc, etc. We want to be able to open that black box, see what’s going on, figure out where that surprising performance is coming from. That would give us far more information, and far more useful information, than just “huh, that worked surprisingly well/poorly”. And in particular, there’s no way in hell we’re going to understand how an engine works without opening it up like that.
The same idea carries over to AI: there’s no way in hell we’re going to understand how intelligence works without opening the black box. If we can open it up, see what’s going on, figure out where surprises come from and why, then we get orders of magnitude more information and more useful information. (Of course, this also means that we need to figure out what things to look at inside the black box and how—the analogues of temperatures, pressures, friction, mixing, etc in an engine.)
You can build a good engine without any sensors inside, and indeed people did—i.e. back in the 19th century when sensors of that sort didn’t exist yet. (They had thermometers and pressure gauges, but they couldn’t just get any information from any point inside the engine block, like we can by looking at activations in a NN.) What the engineers of the 19th century had, and what we need, is a general theory. For engines, that was thermodynamics. For AI, we need some kind of Theory of Intelligence. The scaling laws might be pointing the way to a kind of thermodynamics of intelligence.
I totally agree with this. Alas, “understand how AGI works” is not something which most capabilities work even attempts to do.
It turns out that people can advance capabilities without having much clue what’s going on inside their magic black boxes, and that’s what most capabilities work looks like at this point.
Agreed, but the black-box experimentation seems like it’s plausibly a prerequisite for actual understanding? E.g. you couldn’t analyze InceptionV1 or CLIP to understand its inner workings before you actually had those models. To use your car engine metaphor from the other comment, we can’t open the engine and stick it full of sensors before we actually have the engine. And now that we do have engines, people are starting to stick them full of sensors, even if most of the work is still focused on building even fancier engines.
It seems reasonable to expect that as long as there are low-hanging fruit to be picked using black boxes, we get a lot of black boxes and the occasional paper dedicated to understanding what’s going on with them and how they work. Then when it starts getting harder to get novel interesting results with just black box tinkering, the focus will shift to greater theoretical understanding and more thoroughly understanding everything that we’ve accomplished so far.
I think we are getting some information. For example, we can see that token level attention is actually quite powerful for understanding language and also images. We have some understanding of scaling laws. I think the next step is a deeper understanding of how world modeling fits in with action generation—how much can you get with just world modeling, versus world modeling plus reward/action combined?
If the transformer architecture is enough to get us there, it tells us a sort of null hypothesis for intelligence—that the structure for predicting sequences by comparing all pairs of elements of a limited sequence—is general.
Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
I think teaching a transformer with an internal thought process (predicting the next tokens over a part of the sequence that’s “showing your work”) would be an interesting insight into how intelligence might work. I thought of this a little while back but also discovered this is also a long standing MIRI research direction into transparency. I wouldn’t be surprised if Google took it up at this point.
Suppose I’m designing an engine. I try out a new design, and it surprises me—it works much worse or much better than expected. That’s a few bits of information. That’s basically the sort of information we get from AI experiments today.
What we’d really like is to open up that surprising engine, stick thermometers all over the place, stick pressure sensors all over the place, measure friction between the parts, measure vibration, measure fluid flow and concentrations and mixing, measure heat conduction, etc, etc. We want to be able to open that black box, see what’s going on, figure out where that surprising performance is coming from. That would give us far more information, and far more useful information, than just “huh, that worked surprisingly well/poorly”. And in particular, there’s no way in hell we’re going to understand how an engine works without opening it up like that.
The same idea carries over to AI: there’s no way in hell we’re going to understand how intelligence works without opening the black box. If we can open it up, see what’s going on, figure out where surprises come from and why, then we get orders of magnitude more information and more useful information. (Of course, this also means that we need to figure out what things to look at inside the black box and how—the analogues of temperatures, pressures, friction, mixing, etc in an engine.)
You can build a good engine without any sensors inside, and indeed people did—i.e. back in the 19th century when sensors of that sort didn’t exist yet. (They had thermometers and pressure gauges, but they couldn’t just get any information from any point inside the engine block, like we can by looking at activations in a NN.) What the engineers of the 19th century had, and what we need, is a general theory. For engines, that was thermodynamics. For AI, we need some kind of Theory of Intelligence. The scaling laws might be pointing the way to a kind of thermodynamics of intelligence.