I have a feeling your true rejection runs deeper than you’re describing. You cite a thought experiment of Einstein’s as being useful and correct. You explain that Less Wrong relies on thought experiments too heavily. You suggest that Less Wrong should lean heavier on data from the real world. But the single data point you cite on the question of thought experiments indicates that they are useful and correct. It seems like your argument fails by its own standard.
I think the reliability of thought experiments is a tricky question to resolve. I think we might as well expand the category of thought experiments to “any reasoning about the world that isn’t reasoning directly from data”. When I think about the reliability of this reasoning, my immediate thought is that I expect some people to be much better at it than others. In fact, I think being good at this sort of reasoning is almost exactly the same as being intelligent. Reasoning directly from data is like looking up the answers in the back of the book.
This leaves us with two broad positions: the “humans are dumb/the world is tricky” position that the only way we can ever get anywhere is through constant experimentation, and the “humans are smart/the world is understandable” position that we can usefully make predictions based on limited data.
I think these positions are too broad to be useful. It depends a lot on the humans, and it depends a lot on the aspect of the world being studied. Reasoning from first principles works better in physics than in medicine; in that sense, medicine is a trickier subject to study.
If the tricky world hypothesis is true for the questions MIRI is investigating, or the MIRI team is too dumb, I could see the sort of empirical investigations you propose as being the right approach: they don’t really answer the most important questions we want answered, but there probably isn’t a way for MIRI to answer those questions anyway, so might as well answer the questions that are answerable and see if the results lead anywhere.
Anyway, I think a lot of the value of many LW posts is in finding useful ideas that are also very general. (Paul Graham’s description of what philosophy done right looks like.) Very general ideas are harder to test, because they cut across domains. The reason I like the many citations in Thinking Fast and Slow is that I expect the general ideas it presents to be more reliably true because they’re informed by at least some experimental data. But general, useful ideas can be so useful that I don’t mind taking the time to read about them even if they’re not informed by lots of experimental data. Specifically, having lots of general, useful ideas that are also correct (e.g. knowing when and how to add numbers) makes you more intelligent according to my definition above. And I consider myself intelligent enough to be able to tell apart the true general, useful ideas from the false ones through my own reasoning and experience at least somewhat reliably.
Broadly speaking, I think Less Wrong is counterproductive if specific general, useful ideas it promotes are false. (It’s hard to imagine how it could be counterproductive if they were true.) And at that point we’re talking about whether specific posts are true or false. Lukeprog has this list of points of agreement between the sequences and mainstream academia, which causes me to update in the direction of those points of agreement being true.
I think you’re being overly hard on the AI box experiment. It’s obviously testing something. It’d be great if we could fork the universe, import a superintelligence, set up a bunch of realistic safeguards, and empirically determine how things played out. But that’s not practical. We did manage to find an experiment that might shed some light on the scenario, but the experiment uses a smart human instead of a superintelligence and a single gatekeeper instead of a more elaborate set of controls. It seems to me that you aren’t following your own standard: you preach the value of empiricism and then throw out some of the only data points available, for theoretical reasons, without producing any better data. I agree that it’s some pretty weak data, but it seems better to think about it than throw it out and just believe whatever we like, and I think weak data is about as well as you’re going to do in this domain.
You cite a thought experiment of Einstein’s as being useful and correct.
I cite a thought experiment of Einstein’s as being useful but insufficient. It was not correct until observation matched anticipation. I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth. Do you see the difference?
I think you’re being overly hard on the AI box experiment. It’s obviously testing something.
No, this is not obvious to me. Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth.
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth”? It’s obvious the method has to work to some extent, because Einstein couldn’t have arrived at a correct view by chance. Is your view that Einstein should have updated less from whatever reasoning process he used to pick out that hypothesis from the space of hypotheses, than from the earliest empirical tests of that hypothesis, contra Einstein’s Arrogance?
Or is your view that, while Einstein may technically have gone through a process like that, no one should assume they are in fact Einstein—i.e., Einstein’s capabilities are so rare, or his methods are so unreliable (not literally at the level of chance, but, say, at the level of 1000-to-1 odds of working), that by default you should harshly discount any felt sense that your untested hypothesis is already extremely well-supported?
Or perhaps you should harshly discount it until you have meta-evidence, in the form of a track record of successfully predicting which untested hypotheses will turn out to be correct.
Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
The AI box experiment is a response to the claim ‘superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box’. It functions as an existence proof; if a human level of social competence is already sufficient to talk one’s way out of a box with nonzero frequency, then we can’t dismiss risk from superhuman levels of social competence.
If you think the claim Eliezer was responding to is silly on priors, or just not relevant (because it would be easy to assess an AI’s social competence and/or prevent it from gaining such competence), then you won’t be interested in that part of the conversation.
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
I mentioned that possibility above. But Einstein couldn’t have been merely lucky—even if it weren’t the case that he was able to succeed repeatedly, his very first success was too improbable for him to have just plucking random physical theories out of a hat. Einstein was not a random number generator, so there was some kind of useful cognitive work going on.
That leaves open the possibility that it was only useful enough to give Einstein a 1% chance of actually being right; but still, I’m curious about whether you do think he only had a 1% chance of being right, or (if not) what rough order of magnitude you’d estimate. And I’d likewise like to know what method he used to even reach a 1% probability of success (or 10%, or 0.1%), and why we should or shouldn’t think this method could be useful elsewhere.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
Can you define “confirmation” for me, in terms of probability theory?
Big Al may well have had some intuitive mojo that enabled him to pick the right thought experiments , but that still doesn’t make thought experiments a substitute for real empiricism. And intuitive mojo, isnt a method in the sense of vbeing reproducible.
Can you define “confirmation” for me, in terms of probability theory?
Why not derive probability theory in terms of confirmation.?
Thought experiments aren’t a replacement for real empiricism. They’re a prerequisite for real empiricism.
“Intuitive mojo” is just calling a methodology you don’t understand a mean name. However Einstein repeatedly hit success in his lifetime, presupposing that it is an ineffable mystery or a grand coincidence won’t tell us much.
Why not derive probability theory in terms of confirmation.?
I already understand probability theory, and why it’s important. I don’t understand what you mean by “confirmation,” how your earlier statement can be made sense of in quantitative terms, or why this notion should be treated as important here. So I’m asking you to explain the less clear term in terms of the more clear term.
Actually he did not. He got lucky early in his career, and pretty much coasted on that into irrelevance. His intuition allowed him to solve problems related to relativity, the photoelectric effect, Brownian motion, and a few other significant contributions within the span of a decade, early in his career. And then he went off the deep end following his intuition down a number of dead-ending rabbit holes for the rest of his life. He died in Princeton in 1955 having made no further significant contributions to physics after is 1916 invention of general relativity. Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
...huh? Correct me if I’m wrong here, but Einstein was a great physicist who made lots of great discoveries, right?
The right cautionary tale would be to cite physicists who attempted to follow the same strategy Einstein did and see how it mostly only worked for Einstein. But if Einstein was indeed a great physicist, it seems like at worst his strategy is one that doesn’t usually produce results but sometimes produces spectacular results… which doesn’t seem like a terrible strategy.
I have a very strong (empirical!) heuristic that the first thing people should do if they’re trying to be good at something is copy winners. Yes there are issues like regression to the mean and stuff, but it provides a good alternative perspective vs thinking things through from first principles (which seems to be my default cognitive strategy).
The thing is Einstein was popular, but his batting average was less than his peers. In terms of advancing the state of the art, the 20th century is full of theoretical physicists that have a better track record for pushing the state of the art forward than Einstein, most of whom did not spend the majority of their career chasing rabbits down holes. They may not be common household names, but honestly that might have more to do with the hair than his physics.
I should point out that I heard this cautionary tale as “don’t set your sights too high,” not “don’t employ the methods Einstein employed.” The methods were fine, the trouble was that he was at IAS and looking for something bigger than his previous work, rather than planting acorns that would grow into mighty oaks (as Hamming puts it).
The AI box experiment only serves even as that if you assume that the AI box experiment sufficiently replicates the conditions that would actually be faced by someone with an AI in a box. Also, it only serves as such if it is otherwise a good experiment, but since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
Again, the AI box experiment is a response to the claim “superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box”. If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up. But this constitutes a change of topic, not an objection to the experiment.
since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up.
Not all relevant differences between an experiment and an actual AI scenario can be accurately characterized as “reason to think that superintelligences are hard to box”. For instance, imagine an experiment with no gatekeeper or AI party at all, where the result of the experiment depends on flipping a coin to decide whether the AI gets out. That experiment is very different from a realistic AI scenario, but one need not have a reason to believe that intelligences are hard to box—or even hold any opinion at all on whether intelligences are hard to box—to object to the experimental design.
For the AI box experiment as stated, one of the biggest flaws is that the gatekeeper is required to stay engaged with the AI and can’t ignore it. This allows the AI to win by either verbally abusing the gatekeeper to the extent that he doesn’t want to stay around any more, or by overwhelming the gatekeeper with lengthy arguments that take time or outside assistance to analyze. These situations would not be a win for an actual AI in a box.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
Refusing to release the transcripts causes other problems than just hiding fakery. If the experiment is flawed in some way, for instance, it could hide that—and it would be foolish to demand that everyone name possible flaws one by one and ask you “does this have flaw A?”, “does this have flaw B?”, etc. in order to determine whether the experiment has any flaws. There are also cases where whether something is a flaw is an opinion that can be argued, and it might be that someone else would consider a flaw something that the experimenter doesn’t.
Besides, in a real boxed AI situation, it’s likely that gatekeepers will be tested on AI-box experiments and will be given transcripts of experiment sessions to better prepare them for the real AI. An experiment that simulates an AI boxing should likewise have participants be able to read other sessions.
Thanks for the response.
I have a feeling your true rejection runs deeper than you’re describing. You cite a thought experiment of Einstein’s as being useful and correct. You explain that Less Wrong relies on thought experiments too heavily. You suggest that Less Wrong should lean heavier on data from the real world. But the single data point you cite on the question of thought experiments indicates that they are useful and correct. It seems like your argument fails by its own standard.
I think the reliability of thought experiments is a tricky question to resolve. I think we might as well expand the category of thought experiments to “any reasoning about the world that isn’t reasoning directly from data”. When I think about the reliability of this reasoning, my immediate thought is that I expect some people to be much better at it than others. In fact, I think being good at this sort of reasoning is almost exactly the same as being intelligent. Reasoning directly from data is like looking up the answers in the back of the book.
This leaves us with two broad positions: the “humans are dumb/the world is tricky” position that the only way we can ever get anywhere is through constant experimentation, and the “humans are smart/the world is understandable” position that we can usefully make predictions based on limited data.
I think these positions are too broad to be useful. It depends a lot on the humans, and it depends a lot on the aspect of the world being studied. Reasoning from first principles works better in physics than in medicine; in that sense, medicine is a trickier subject to study.
If the tricky world hypothesis is true for the questions MIRI is investigating, or the MIRI team is too dumb, I could see the sort of empirical investigations you propose as being the right approach: they don’t really answer the most important questions we want answered, but there probably isn’t a way for MIRI to answer those questions anyway, so might as well answer the questions that are answerable and see if the results lead anywhere.
Anyway, I think a lot of the value of many LW posts is in finding useful ideas that are also very general. (Paul Graham’s description of what philosophy done right looks like.) Very general ideas are harder to test, because they cut across domains. The reason I like the many citations in Thinking Fast and Slow is that I expect the general ideas it presents to be more reliably true because they’re informed by at least some experimental data. But general, useful ideas can be so useful that I don’t mind taking the time to read about them even if they’re not informed by lots of experimental data. Specifically, having lots of general, useful ideas that are also correct (e.g. knowing when and how to add numbers) makes you more intelligent according to my definition above. And I consider myself intelligent enough to be able to tell apart the true general, useful ideas from the false ones through my own reasoning and experience at least somewhat reliably.
Broadly speaking, I think Less Wrong is counterproductive if specific general, useful ideas it promotes are false. (It’s hard to imagine how it could be counterproductive if they were true.) And at that point we’re talking about whether specific posts are true or false. Lukeprog has this list of points of agreement between the sequences and mainstream academia, which causes me to update in the direction of those points of agreement being true.
I think you’re being overly hard on the AI box experiment. It’s obviously testing something. It’d be great if we could fork the universe, import a superintelligence, set up a bunch of realistic safeguards, and empirically determine how things played out. But that’s not practical. We did manage to find an experiment that might shed some light on the scenario, but the experiment uses a smart human instead of a superintelligence and a single gatekeeper instead of a more elaborate set of controls. It seems to me that you aren’t following your own standard: you preach the value of empiricism and then throw out some of the only data points available, for theoretical reasons, without producing any better data. I agree that it’s some pretty weak data, but it seems better to think about it than throw it out and just believe whatever we like, and I think weak data is about as well as you’re going to do in this domain.
I cite a thought experiment of Einstein’s as being useful but insufficient. It was not correct until observation matched anticipation. I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth. Do you see the difference?
No, this is not obvious to me. Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth”? It’s obvious the method has to work to some extent, because Einstein couldn’t have arrived at a correct view by chance. Is your view that Einstein should have updated less from whatever reasoning process he used to pick out that hypothesis from the space of hypotheses, than from the earliest empirical tests of that hypothesis, contra Einstein’s Arrogance?
Or is your view that, while Einstein may technically have gone through a process like that, no one should assume they are in fact Einstein—i.e., Einstein’s capabilities are so rare, or his methods are so unreliable (not literally at the level of chance, but, say, at the level of 1000-to-1 odds of working), that by default you should harshly discount any felt sense that your untested hypothesis is already extremely well-supported?
Or perhaps you should harshly discount it until you have meta-evidence, in the form of a track record of successfully predicting which untested hypotheses will turn out to be correct.
The AI box experiment is a response to the claim ‘superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box’. It functions as an existence proof; if a human level of social competence is already sufficient to talk one’s way out of a box with nonzero frequency, then we can’t dismiss risk from superhuman levels of social competence.
If you think the claim Eliezer was responding to is silly on priors, or just not relevant (because it would be easy to assess an AI’s social competence and/or prevent it from gaining such competence), then you won’t be interested in that part of the conversation.
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
I mentioned that possibility above. But Einstein couldn’t have been merely lucky—even if it weren’t the case that he was able to succeed repeatedly, his very first success was too improbable for him to have just plucking random physical theories out of a hat. Einstein was not a random number generator, so there was some kind of useful cognitive work going on.
That leaves open the possibility that it was only useful enough to give Einstein a 1% chance of actually being right; but still, I’m curious about whether you do think he only had a 1% chance of being right, or (if not) what rough order of magnitude you’d estimate. And I’d likewise like to know what method he used to even reach a 1% probability of success (or 10%, or 0.1%), and why we should or shouldn’t think this method could be useful elsewhere.
Can you define “confirmation” for me, in terms of probability theory?
Big Al may well have had some intuitive mojo that enabled him to pick the right thought experiments , but that still doesn’t make thought experiments a substitute for real empiricism. And intuitive mojo, isnt a method in the sense of vbeing reproducible.
Why not derive probability theory in terms of confirmation.?
Thought experiments aren’t a replacement for real empiricism. They’re a prerequisite for real empiricism.
“Intuitive mojo” is just calling a methodology you don’t understand a mean name. However Einstein repeatedly hit success in his lifetime, presupposing that it is an ineffable mystery or a grand coincidence won’t tell us much.
I already understand probability theory, and why it’s important. I don’t understand what you mean by “confirmation,” how your earlier statement can be made sense of in quantitative terms, or why this notion should be treated as important here. So I’m asking you to explain the less clear term in terms of the more clear term.
Actually he did not. He got lucky early in his career, and pretty much coasted on that into irrelevance. His intuition allowed him to solve problems related to relativity, the photoelectric effect, Brownian motion, and a few other significant contributions within the span of a decade, early in his career. And then he went off the deep end following his intuition down a number of dead-ending rabbit holes for the rest of his life. He died in Princeton in 1955 having made no further significant contributions to physics after is 1916 invention of general relativity. Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
There are worse fates than not being able to top your own discovery of general relativity.
...huh? Correct me if I’m wrong here, but Einstein was a great physicist who made lots of great discoveries, right?
The right cautionary tale would be to cite physicists who attempted to follow the same strategy Einstein did and see how it mostly only worked for Einstein. But if Einstein was indeed a great physicist, it seems like at worst his strategy is one that doesn’t usually produce results but sometimes produces spectacular results… which doesn’t seem like a terrible strategy.
I have a very strong (empirical!) heuristic that the first thing people should do if they’re trying to be good at something is copy winners. Yes there are issues like regression to the mean and stuff, but it provides a good alternative perspective vs thinking things through from first principles (which seems to be my default cognitive strategy).
The thing is Einstein was popular, but his batting average was less than his peers. In terms of advancing the state of the art, the 20th century is full of theoretical physicists that have a better track record for pushing the state of the art forward than Einstein, most of whom did not spend the majority of their career chasing rabbits down holes. They may not be common household names, but honestly that might have more to do with the hair than his physics.
I should point out that I heard this cautionary tale as “don’t set your sights too high,” not “don’t employ the methods Einstein employed.” The methods were fine, the trouble was that he was at IAS and looking for something bigger than his previous work, rather than planting acorns that would grow into mighty oaks (as Hamming puts it).
OK, good to know.
The AI box experiment only serves even as that if you assume that the AI box experiment sufficiently replicates the conditions that would actually be faced by someone with an AI in a box. Also, it only serves as such if it is otherwise a good experiment, but since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
Again, the AI box experiment is a response to the claim “superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box”. If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up. But this constitutes a change of topic, not an objection to the experiment.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
Not all relevant differences between an experiment and an actual AI scenario can be accurately characterized as “reason to think that superintelligences are hard to box”. For instance, imagine an experiment with no gatekeeper or AI party at all, where the result of the experiment depends on flipping a coin to decide whether the AI gets out. That experiment is very different from a realistic AI scenario, but one need not have a reason to believe that intelligences are hard to box—or even hold any opinion at all on whether intelligences are hard to box—to object to the experimental design.
For the AI box experiment as stated, one of the biggest flaws is that the gatekeeper is required to stay engaged with the AI and can’t ignore it. This allows the AI to win by either verbally abusing the gatekeeper to the extent that he doesn’t want to stay around any more, or by overwhelming the gatekeeper with lengthy arguments that take time or outside assistance to analyze. These situations would not be a win for an actual AI in a box.
Refusing to release the transcripts causes other problems than just hiding fakery. If the experiment is flawed in some way, for instance, it could hide that—and it would be foolish to demand that everyone name possible flaws one by one and ask you “does this have flaw A?”, “does this have flaw B?”, etc. in order to determine whether the experiment has any flaws. There are also cases where whether something is a flaw is an opinion that can be argued, and it might be that someone else would consider a flaw something that the experimenter doesn’t.
Besides, in a real boxed AI situation, it’s likely that gatekeepers will be tested on AI-box experiments and will be given transcripts of experiment sessions to better prepare them for the real AI. An experiment that simulates an AI boxing should likewise have participants be able to read other sessions.