To me this is a metaphor for Alignment research, and LW-style rationality in general, but with an opposite message.
To start, I have this exact AC in my window, and it made a huge difference during last year’s heat dome. (I will use metric units in the following, because eff imperial units.) It was around 39-40C last summer, some 15C above average, for a few days, and the A/C cooled the place down by about 10C, which made a difference between livable and unlivable. It was cooler all through the place, not just in the immediate vicinity of the unit.
How could this happen, in an apparent contradiction to the laws of physics?
Well, three things:
I live in an apartment, so the air coming in is not quite as hot in the hallway as outside, though still pretty warm.
The air coming out of the AC exhaust is pretty hot, hotter than the outside most of the time, so there is a definite cooling that happens despite the air influx from outside.
The differential air pressure in the hallway is positive regardless of the AC (partly because of the exhaust vents that are always on), so adding AC does not significantly change the air flow.
So, physics is safe! What isn’t safe is the theoretical reasoning that “a single-hose AC does not work”.
Why? Because of the assumptions that go into the reasoning, some stated, some unstated. For example, that outside air is necessarily hot, that there is an extra ingress of outside air due to AC, that people use this AC for detached houses, etc.
AI research is much more complicated than HVAC analysis, and so there are many more factors, assumptions and effects that go into it. Confidently proclaiming a certain outcome, like “we are all doomed!” is based on a sound analysis of incomplete data, and better data, given the bounded rationality constraint, can only be obtained iteratively through experiment and incremental improvement.
Note that the analysis can look airtight (no pun intended) from outside: in the AC example this is basic energy conservation and the continuity equation. In the dieting case it’s calories-in-calories out. In AI research it’s the… inability to steer the direction of recursive self-improvement or something. But that model of analysis has been falsified again and again: The Waterfall approach to software development gave way to Agile. Space X’s fast iterations let it run rings around Boeing’s “we carefully design it once and then it will fly”.
The more complicated a problem, the more iterative an acceptable solution will be.
And you cannot pronounce something possible or impossible until you went through a lot of iterations of actually building something and gained the experience that becomes knowledge of what works and what does not. That’s how Junior developers become Senior ones.
Now, note that MIRI has not built a single AI. Unlike those other companies. All their reasoning is a sound theoretical analysis… based on none or very little experimental data.
It may well be that Eliezer and Co are correct about everything. But the outside view puts them in the reference class of those who are more often wrong than right: those who rely exclusively on logic and eschew actually building things and see where they work, where they break and why.
To be fair, there are plenty of examples where theoretical reasoning is enough. For example, all kinds of Perpetual Motion machines are guaranteed to be bunk. Or EM drive-style approaches. Or reducing the entropy of a closed system. If we had ironclad experimentally tested and confirmed laws like that for AI research, we would be able to trust the theoretical conclusions. As it is, we are way too early in the process of experimenting and generalizing our experimental data into laws, as far as AI research and especially Alignment research is concerned. We may be unlucky and accidentally end up all dead. But to figure out the odds of it happening one needs to do realistic experimental alignment research: build small and progressively large AI and actually try to align it. “Debates” are all nice and fun, but no substitute for data.
So the message I get from the OP is “experiment to get the data and adjust your assumptions”, or else you may end up overpaying for a super-duper HVAC system you don’t need, or worse, deciding you cannot afford it and dying from a heat stroke.
Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory.
However, I don’t think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was then.
Now the general gist of the message has absolutely been proven out. Machine learning is now a big impressive thing in the world, and scary outcomes are right around the corner. Forecasting that now doesn’t win you nearly as many points as forecasting that 15 or 20 years ago.
Now we are finally close enough that it makes sense to move from theorizing to experimentation. That doesn’t mean the theorizing was useless. It laid an incredible amount of valuable groundwork. It gave the experimental researchers a server of what they are up against. Laid out the scope of the problem, and made helpful pointers towards important characteristics that good solutions must have.
A call to action for an army of experimentalists to put the theory to test us not evidence against MIRI. Their theory is still useful and helpful, it’s just no longer where we need the most rapid expansion of effort. We are entering a new phase now, gathering experimental data, and we still need MIRI, and others like them, to update and refine the theory in response to the experimental data. Theory used to be 95% of the work going into AGI alignment. Now it needs to become more like 5%. Not by decreasing the work going into theory. We need to increase that! But rather, because now it is time to throw open the floodgates of evidence and unleash the army of experimenters. Engineering work is easier than pure theory to get right, so thankfully, lots more people are qualified to contribute. This is good, because we need a lot. There is so much work to be done.
Hmm, I agree that Eliezer, MIRI and its precursors did a lot of good work raising the profile of this particular x-risk. However, I am less certain of their theoretical contributions, which you describe as
That doesn’t mean the theorizing was useless. It laid an incredible amount of valuable groundwork. It gave the experimental researchers a server of what they are up against. Laid out the scope of the problem, and made helpful pointers towards important characteristics that good solutions must have.
I guess they did highlight a lot of dead ends, gotta agree with that. I am not sure how much the larger AI/ML community values their theoretical work. Maybe the practitioners haven’t caught up yet.
Theory used to be 95% of the work going into AGI alignment. Now it needs to become more like 5%
Well, whatever the fraction, it certainly seems like it’s time to rebalance it, I agree. I don’t know if MIRI has the know-how to do experimental work at the level of the rapidly advancing field.
I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.
There’s a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren’t noticeable because the AIs we are experimenting on won’t be powerful enough to reveal them. Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory.
I agree that, given MIRI’s model of AGI emergence, getting it slightly wrong would be catastrophic. But that’s my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way.
Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure.
so that there are no “immense optimization pressures”.
My intuition tells me that the single-hose solution is not enough for AGI and we instead need that is flawless in practice and in theory.
I think that’s what Eliezer says, as well, hence his pessimism and focus on “dying with dignity”. But we won’t know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because “there is no fire alarm for superintelligence”, but the alternative is strictly worse, because the problem is so complex.
The Waterfall approach to software development gave way to Agile. Space X’s fast iterations let it run rings around Boeing’s “we carefully design it once and then it will fly”.
This is fine for other fields, but the problem with superintelligent alignment is that the things in “move fast and break things” is, like, us. We only have one chance to get superhuman alignment right, which is why we have to design it carefully once. Misaligned systems will try to turn you off to make sure you can’t turn them off. I think Eliezer has even said that if we could simply revert the universe after each time we mess up alignment, he would be way less pessimistic.
Further, experiments with systems that are not capable enough to kill us yet can provide us with valuable information, but the problem is that much of the difficulty comes up around superintelligence levels. Things are going to break in weird ways that we couldn’t have anticipated just by extrapolating out the trends from current systems. So if we just wait for evidence that less robust solutions are not enough, then we will see that less robust solutions seem to work really well on weak current models, pat ourselves on the back for figuring out that actually alignment wouldn’t be that hard in X area and then as we approach superintelligence we start noticing X break down (if we’re lucky! if X is something like deception, it will try to hide from us and actively avoid being caught by our interpretability tools or whatever) and at that point it will be too late to try and fix the problem, because none of the technical or political solutions are viable in a very short time horizon.
Again, to be very clear, I’m not arguing that there is no use at all for empirical experiments today, it’s just that there are specific failure cases that are easy to fall into whenever you try to conclude something of the form “and therefore this is some amount of evidence that superintelligence will be more/less likely to be Y and Z”
I agree that we cannot be cavalier about it, but not experimenting is strictly worse than experimenting (not at the expense of theoretical work), because humans are bad at pure theory.
It is actually confirmed by this particular case. Special Relativity took some 50 years to form after Maxwell equations were written. General relativity took 500 years to be written down after Galileo’s experiment with equal acceleration of falling bodies. AND it took a once in a millennium genius to do that. (Twice, actually, Newton was the other one in physics.)
EDIT: several commenters seem to think that I’m claiming this air conditioner does not work at all, so I want to clarify that it will still cool down a room on net. The point is not that it doesn’t work at all. The point is that it’s stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.
I do believe the analysis in the post is in fact correct, and the success of this air conditioner is primarily due to consumers not recognizing the problem. Would you have spent an extra, say, $30 on a two-hose air conditioner if you had noticed the issue in advance?
(BTW, I also bought a one-hose air conditioner off amazon a few years back, which is where this example came from. When I realized there was only one hose, I was absolutely flabbergasted that anyone would even bother to build such an obviously stupid thing. And it indeed did a pretty shitty job cooling my apartment!)
Looks like you did the usual iterative approach: bought an AC, saw that it doesn’t work as expected, did the analysis, figured out what is wrong, and corrected your model of what works in your situation, then bought a better AC.
bought an AC, saw that it doesn’t work as expected, did the analysis
I read John as saying steps two and three here were reversed. He bought an AC, realized before trying it that it wouldn’t work, then tested and saw that (as expected) it didn’t work.
That’s true! When I opened the box, I first dug around looking for the second hose. Then I thought they must have made a mistake and not sent the second hose. Then eventually I noticed that the AC only had one hose-slot, and the pictures only had one hose, and I was just very confused as to why on earth someone would build a portable air conditioner with only one hose.
To me this is a metaphor for Alignment research, and LW-style rationality in general, but with an opposite message.
To start, I have this exact AC in my window, and it made a huge difference during last year’s heat dome. (I will use metric units in the following, because eff imperial units.) It was around 39-40C last summer, some 15C above average, for a few days, and the A/C cooled the place down by about 10C, which made a difference between livable and unlivable. It was cooler all through the place, not just in the immediate vicinity of the unit.
How could this happen, in an apparent contradiction to the laws of physics?
Well, three things:
I live in an apartment, so the air coming in is not quite as hot in the hallway as outside, though still pretty warm.
The air coming out of the AC exhaust is pretty hot, hotter than the outside most of the time, so there is a definite cooling that happens despite the air influx from outside.
The differential air pressure in the hallway is positive regardless of the AC (partly because of the exhaust vents that are always on), so adding AC does not significantly change the air flow.
So, physics is safe! What isn’t safe is the theoretical reasoning that “a single-hose AC does not work”.
Why? Because of the assumptions that go into the reasoning, some stated, some unstated. For example, that outside air is necessarily hot, that there is an extra ingress of outside air due to AC, that people use this AC for detached houses, etc.
AI research is much more complicated than HVAC analysis, and so there are many more factors, assumptions and effects that go into it. Confidently proclaiming a certain outcome, like “we are all doomed!” is based on a sound analysis of incomplete data, and better data, given the bounded rationality constraint, can only be obtained iteratively through experiment and incremental improvement.
Note that the analysis can look airtight (no pun intended) from outside: in the AC example this is basic energy conservation and the continuity equation. In the dieting case it’s calories-in-calories out. In AI research it’s the… inability to steer the direction of recursive self-improvement or something. But that model of analysis has been falsified again and again: The Waterfall approach to software development gave way to Agile. Space X’s fast iterations let it run rings around Boeing’s “we carefully design it once and then it will fly”.
The more complicated a problem, the more iterative an acceptable solution will be.
And you cannot pronounce something possible or impossible until you went through a lot of iterations of actually building something and gained the experience that becomes knowledge of what works and what does not. That’s how Junior developers become Senior ones.
Now, note that MIRI has not built a single AI. Unlike those other companies. All their reasoning is a sound theoretical analysis… based on none or very little experimental data.
It may well be that Eliezer and Co are correct about everything. But the outside view puts them in the reference class of those who are more often wrong than right: those who rely exclusively on logic and eschew actually building things and see where they work, where they break and why.
To be fair, there are plenty of examples where theoretical reasoning is enough. For example, all kinds of Perpetual Motion machines are guaranteed to be bunk. Or EM drive-style approaches. Or reducing the entropy of a closed system. If we had ironclad experimentally tested and confirmed laws like that for AI research, we would be able to trust the theoretical conclusions. As it is, we are way too early in the process of experimenting and generalizing our experimental data into laws, as far as AI research and especially Alignment research is concerned. We may be unlucky and accidentally end up all dead. But to figure out the odds of it happening one needs to do realistic experimental alignment research: build small and progressively large AI and actually try to align it. “Debates” are all nice and fun, but no substitute for data.
So the message I get from the OP is “experiment to get the data and adjust your assumptions”, or else you may end up overpaying for a super-duper HVAC system you don’t need, or worse, deciding you cannot afford it and dying from a heat stroke.
Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory. However, I don’t think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was then. Now the general gist of the message has absolutely been proven out. Machine learning is now a big impressive thing in the world, and scary outcomes are right around the corner. Forecasting that now doesn’t win you nearly as many points as forecasting that 15 or 20 years ago. Now we are finally close enough that it makes sense to move from theorizing to experimentation. That doesn’t mean the theorizing was useless. It laid an incredible amount of valuable groundwork. It gave the experimental researchers a server of what they are up against. Laid out the scope of the problem, and made helpful pointers towards important characteristics that good solutions must have. A call to action for an army of experimentalists to put the theory to test us not evidence against MIRI. Their theory is still useful and helpful, it’s just no longer where we need the most rapid expansion of effort. We are entering a new phase now, gathering experimental data, and we still need MIRI, and others like them, to update and refine the theory in response to the experimental data. Theory used to be 95% of the work going into AGI alignment. Now it needs to become more like 5%. Not by decreasing the work going into theory. We need to increase that! But rather, because now it is time to throw open the floodgates of evidence and unleash the army of experimenters. Engineering work is easier than pure theory to get right, so thankfully, lots more people are qualified to contribute. This is good, because we need a lot. There is so much work to be done.
Hmm, I agree that Eliezer, MIRI and its precursors did a lot of good work raising the profile of this particular x-risk. However, I am less certain of their theoretical contributions, which you describe as
I guess they did highlight a lot of dead ends, gotta agree with that. I am not sure how much the larger AI/ML community values their theoretical work. Maybe the practitioners haven’t caught up yet.
Well, whatever the fraction, it certainly seems like it’s time to rebalance it, I agree. I don’t know if MIRI has the know-how to do experimental work at the level of the rapidly advancing field.
I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.
There’s a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren’t noticeable because the AIs we are experimenting on won’t be powerful enough to reveal them. Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory.
I agree that, given MIRI’s model of AGI emergence, getting it slightly wrong would be catastrophic. But that’s my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way.
I had mentioned in another post that https://www.lesswrong.com/posts/mc2vroppqHsFLDEjh/aligned-ai-needs-slack
so that there are no “immense optimization pressures”.
I think that’s what Eliezer says, as well, hence his pessimism and focus on “dying with dignity”. But we won’t know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because “there is no fire alarm for superintelligence”, but the alternative is strictly worse, because the problem is so complex.
This is fine for other fields, but the problem with superintelligent alignment is that the things in “move fast and break things” is, like, us. We only have one chance to get superhuman alignment right, which is why we have to design it carefully once. Misaligned systems will try to turn you off to make sure you can’t turn them off. I think Eliezer has even said that if we could simply revert the universe after each time we mess up alignment, he would be way less pessimistic.
Further, experiments with systems that are not capable enough to kill us yet can provide us with valuable information, but the problem is that much of the difficulty comes up around superintelligence levels. Things are going to break in weird ways that we couldn’t have anticipated just by extrapolating out the trends from current systems. So if we just wait for evidence that less robust solutions are not enough, then we will see that less robust solutions seem to work really well on weak current models, pat ourselves on the back for figuring out that actually alignment wouldn’t be that hard in X area and then as we approach superintelligence we start noticing X break down (if we’re lucky! if X is something like deception, it will try to hide from us and actively avoid being caught by our interpretability tools or whatever) and at that point it will be too late to try and fix the problem, because none of the technical or political solutions are viable in a very short time horizon.
Again, to be very clear, I’m not arguing that there is no use at all for empirical experiments today, it’s just that there are specific failure cases that are easy to fall into whenever you try to conclude something of the form “and therefore this is some amount of evidence that superintelligence will be more/less likely to be Y and Z”
I agree that we cannot be cavalier about it, but not experimenting is strictly worse than experimenting (not at the expense of theoretical work), because humans are bad at pure theory.
The statement ‘humans are bad at pure theory’ seems to be clearly falsified by the extraordinary theoretical advances of the past, e.g. Einstein.
Whether theoretical or experimental approaches will prove most succesful for AI alignment is an open question.
It is actually confirmed by this particular case. Special Relativity took some 50 years to form after Maxwell equations were written. General relativity took 500 years to be written down after Galileo’s experiment with equal acceleration of falling bodies. AND it took a once in a millennium genius to do that. (Twice, actually, Newton was the other one in physics.)
This doesn’t look like a serious reply. I fail to see how the achievements of Newton, Maxwell, Einstein do not illustrate the power of theory.
I have nothing to add to my previous message, other than 500 years to come up with a theory is a long time.
Just added a clarification to the post:
I do believe the analysis in the post is in fact correct, and the success of this air conditioner is primarily due to consumers not recognizing the problem. Would you have spent an extra, say, $30 on a two-hose air conditioner if you had noticed the issue in advance?
(BTW, I also bought a one-hose air conditioner off amazon a few years back, which is where this example came from. When I realized there was only one hose, I was absolutely flabbergasted that anyone would even bother to build such an obviously stupid thing. And it indeed did a pretty shitty job cooling my apartment!)
Looks like you did the usual iterative approach: bought an AC, saw that it doesn’t work as expected, did the analysis, figured out what is wrong, and corrected your model of what works in your situation, then bought a better AC.
I read John as saying steps two and three here were reversed. He bought an AC, realized before trying it that it wouldn’t work, then tested and saw that (as expected) it didn’t work.
That’s true! When I opened the box, I first dug around looking for the second hose. Then I thought they must have made a mistake and not sent the second hose. Then eventually I noticed that the AC only had one hose-slot, and the pictures only had one hose, and I was just very confused as to why on earth someone would build a portable air conditioner with only one hose.