You cite a thought experiment of Einstein’s as being useful and correct.
I cite a thought experiment of Einstein’s as being useful but insufficient. It was not correct until observation matched anticipation. I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth. Do you see the difference?
I think you’re being overly hard on the AI box experiment. It’s obviously testing something.
No, this is not obvious to me. Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth.
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth”? It’s obvious the method has to work to some extent, because Einstein couldn’t have arrived at a correct view by chance. Is your view that Einstein should have updated less from whatever reasoning process he used to pick out that hypothesis from the space of hypotheses, than from the earliest empirical tests of that hypothesis, contra Einstein’s Arrogance?
Or is your view that, while Einstein may technically have gone through a process like that, no one should assume they are in fact Einstein—i.e., Einstein’s capabilities are so rare, or his methods are so unreliable (not literally at the level of chance, but, say, at the level of 1000-to-1 odds of working), that by default you should harshly discount any felt sense that your untested hypothesis is already extremely well-supported?
Or perhaps you should harshly discount it until you have meta-evidence, in the form of a track record of successfully predicting which untested hypotheses will turn out to be correct.
Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
The AI box experiment is a response to the claim ‘superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box’. It functions as an existence proof; if a human level of social competence is already sufficient to talk one’s way out of a box with nonzero frequency, then we can’t dismiss risk from superhuman levels of social competence.
If you think the claim Eliezer was responding to is silly on priors, or just not relevant (because it would be easy to assess an AI’s social competence and/or prevent it from gaining such competence), then you won’t be interested in that part of the conversation.
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
I mentioned that possibility above. But Einstein couldn’t have been merely lucky—even if it weren’t the case that he was able to succeed repeatedly, his very first success was too improbable for him to have just plucking random physical theories out of a hat. Einstein was not a random number generator, so there was some kind of useful cognitive work going on.
That leaves open the possibility that it was only useful enough to give Einstein a 1% chance of actually being right; but still, I’m curious about whether you do think he only had a 1% chance of being right, or (if not) what rough order of magnitude you’d estimate. And I’d likewise like to know what method he used to even reach a 1% probability of success (or 10%, or 0.1%), and why we should or shouldn’t think this method could be useful elsewhere.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
Can you define “confirmation” for me, in terms of probability theory?
Big Al may well have had some intuitive mojo that enabled him to pick the right thought experiments , but that still doesn’t make thought experiments a substitute for real empiricism. And intuitive mojo, isnt a method in the sense of vbeing reproducible.
Can you define “confirmation” for me, in terms of probability theory?
Why not derive probability theory in terms of confirmation.?
Thought experiments aren’t a replacement for real empiricism. They’re a prerequisite for real empiricism.
“Intuitive mojo” is just calling a methodology you don’t understand a mean name. However Einstein repeatedly hit success in his lifetime, presupposing that it is an ineffable mystery or a grand coincidence won’t tell us much.
Why not derive probability theory in terms of confirmation.?
I already understand probability theory, and why it’s important. I don’t understand what you mean by “confirmation,” how your earlier statement can be made sense of in quantitative terms, or why this notion should be treated as important here. So I’m asking you to explain the less clear term in terms of the more clear term.
Actually he did not. He got lucky early in his career, and pretty much coasted on that into irrelevance. His intuition allowed him to solve problems related to relativity, the photoelectric effect, Brownian motion, and a few other significant contributions within the span of a decade, early in his career. And then he went off the deep end following his intuition down a number of dead-ending rabbit holes for the rest of his life. He died in Princeton in 1955 having made no further significant contributions to physics after is 1916 invention of general relativity. Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
...huh? Correct me if I’m wrong here, but Einstein was a great physicist who made lots of great discoveries, right?
The right cautionary tale would be to cite physicists who attempted to follow the same strategy Einstein did and see how it mostly only worked for Einstein. But if Einstein was indeed a great physicist, it seems like at worst his strategy is one that doesn’t usually produce results but sometimes produces spectacular results… which doesn’t seem like a terrible strategy.
I have a very strong (empirical!) heuristic that the first thing people should do if they’re trying to be good at something is copy winners. Yes there are issues like regression to the mean and stuff, but it provides a good alternative perspective vs thinking things through from first principles (which seems to be my default cognitive strategy).
The thing is Einstein was popular, but his batting average was less than his peers. In terms of advancing the state of the art, the 20th century is full of theoretical physicists that have a better track record for pushing the state of the art forward than Einstein, most of whom did not spend the majority of their career chasing rabbits down holes. They may not be common household names, but honestly that might have more to do with the hair than his physics.
I should point out that I heard this cautionary tale as “don’t set your sights too high,” not “don’t employ the methods Einstein employed.” The methods were fine, the trouble was that he was at IAS and looking for something bigger than his previous work, rather than planting acorns that would grow into mighty oaks (as Hamming puts it).
The AI box experiment only serves even as that if you assume that the AI box experiment sufficiently replicates the conditions that would actually be faced by someone with an AI in a box. Also, it only serves as such if it is otherwise a good experiment, but since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
Again, the AI box experiment is a response to the claim “superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box”. If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up. But this constitutes a change of topic, not an objection to the experiment.
since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up.
Not all relevant differences between an experiment and an actual AI scenario can be accurately characterized as “reason to think that superintelligences are hard to box”. For instance, imagine an experiment with no gatekeeper or AI party at all, where the result of the experiment depends on flipping a coin to decide whether the AI gets out. That experiment is very different from a realistic AI scenario, but one need not have a reason to believe that intelligences are hard to box—or even hold any opinion at all on whether intelligences are hard to box—to object to the experimental design.
For the AI box experiment as stated, one of the biggest flaws is that the gatekeeper is required to stay engaged with the AI and can’t ignore it. This allows the AI to win by either verbally abusing the gatekeeper to the extent that he doesn’t want to stay around any more, or by overwhelming the gatekeeper with lengthy arguments that take time or outside assistance to analyze. These situations would not be a win for an actual AI in a box.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
Refusing to release the transcripts causes other problems than just hiding fakery. If the experiment is flawed in some way, for instance, it could hide that—and it would be foolish to demand that everyone name possible flaws one by one and ask you “does this have flaw A?”, “does this have flaw B?”, etc. in order to determine whether the experiment has any flaws. There are also cases where whether something is a flaw is an opinion that can be argued, and it might be that someone else would consider a flaw something that the experimenter doesn’t.
Besides, in a real boxed AI situation, it’s likely that gatekeepers will be tested on AI-box experiments and will be given transcripts of experiment sessions to better prepare them for the real AI. An experiment that simulates an AI boxing should likewise have participants be able to read other sessions.
I cite a thought experiment of Einstein’s as being useful but insufficient. It was not correct until observation matched anticipation. I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth. Do you see the difference?
No, this is not obvious to me. Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth”? It’s obvious the method has to work to some extent, because Einstein couldn’t have arrived at a correct view by chance. Is your view that Einstein should have updated less from whatever reasoning process he used to pick out that hypothesis from the space of hypotheses, than from the earliest empirical tests of that hypothesis, contra Einstein’s Arrogance?
Or is your view that, while Einstein may technically have gone through a process like that, no one should assume they are in fact Einstein—i.e., Einstein’s capabilities are so rare, or his methods are so unreliable (not literally at the level of chance, but, say, at the level of 1000-to-1 odds of working), that by default you should harshly discount any felt sense that your untested hypothesis is already extremely well-supported?
Or perhaps you should harshly discount it until you have meta-evidence, in the form of a track record of successfully predicting which untested hypotheses will turn out to be correct.
The AI box experiment is a response to the claim ‘superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box’. It functions as an existence proof; if a human level of social competence is already sufficient to talk one’s way out of a box with nonzero frequency, then we can’t dismiss risk from superhuman levels of social competence.
If you think the claim Eliezer was responding to is silly on priors, or just not relevant (because it would be easy to assess an AI’s social competence and/or prevent it from gaining such competence), then you won’t be interested in that part of the conversation.
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
I mentioned that possibility above. But Einstein couldn’t have been merely lucky—even if it weren’t the case that he was able to succeed repeatedly, his very first success was too improbable for him to have just plucking random physical theories out of a hat. Einstein was not a random number generator, so there was some kind of useful cognitive work going on.
That leaves open the possibility that it was only useful enough to give Einstein a 1% chance of actually being right; but still, I’m curious about whether you do think he only had a 1% chance of being right, or (if not) what rough order of magnitude you’d estimate. And I’d likewise like to know what method he used to even reach a 1% probability of success (or 10%, or 0.1%), and why we should or shouldn’t think this method could be useful elsewhere.
Can you define “confirmation” for me, in terms of probability theory?
Big Al may well have had some intuitive mojo that enabled him to pick the right thought experiments , but that still doesn’t make thought experiments a substitute for real empiricism. And intuitive mojo, isnt a method in the sense of vbeing reproducible.
Why not derive probability theory in terms of confirmation.?
Thought experiments aren’t a replacement for real empiricism. They’re a prerequisite for real empiricism.
“Intuitive mojo” is just calling a methodology you don’t understand a mean name. However Einstein repeatedly hit success in his lifetime, presupposing that it is an ineffable mystery or a grand coincidence won’t tell us much.
I already understand probability theory, and why it’s important. I don’t understand what you mean by “confirmation,” how your earlier statement can be made sense of in quantitative terms, or why this notion should be treated as important here. So I’m asking you to explain the less clear term in terms of the more clear term.
Actually he did not. He got lucky early in his career, and pretty much coasted on that into irrelevance. His intuition allowed him to solve problems related to relativity, the photoelectric effect, Brownian motion, and a few other significant contributions within the span of a decade, early in his career. And then he went off the deep end following his intuition down a number of dead-ending rabbit holes for the rest of his life. He died in Princeton in 1955 having made no further significant contributions to physics after is 1916 invention of general relativity. Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
There are worse fates than not being able to top your own discovery of general relativity.
...huh? Correct me if I’m wrong here, but Einstein was a great physicist who made lots of great discoveries, right?
The right cautionary tale would be to cite physicists who attempted to follow the same strategy Einstein did and see how it mostly only worked for Einstein. But if Einstein was indeed a great physicist, it seems like at worst his strategy is one that doesn’t usually produce results but sometimes produces spectacular results… which doesn’t seem like a terrible strategy.
I have a very strong (empirical!) heuristic that the first thing people should do if they’re trying to be good at something is copy winners. Yes there are issues like regression to the mean and stuff, but it provides a good alternative perspective vs thinking things through from first principles (which seems to be my default cognitive strategy).
The thing is Einstein was popular, but his batting average was less than his peers. In terms of advancing the state of the art, the 20th century is full of theoretical physicists that have a better track record for pushing the state of the art forward than Einstein, most of whom did not spend the majority of their career chasing rabbits down holes. They may not be common household names, but honestly that might have more to do with the hair than his physics.
I should point out that I heard this cautionary tale as “don’t set your sights too high,” not “don’t employ the methods Einstein employed.” The methods were fine, the trouble was that he was at IAS and looking for something bigger than his previous work, rather than planting acorns that would grow into mighty oaks (as Hamming puts it).
OK, good to know.
The AI box experiment only serves even as that if you assume that the AI box experiment sufficiently replicates the conditions that would actually be faced by someone with an AI in a box. Also, it only serves as such if it is otherwise a good experiment, but since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
Again, the AI box experiment is a response to the claim “superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box”. If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up. But this constitutes a change of topic, not an objection to the experiment.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
Not all relevant differences between an experiment and an actual AI scenario can be accurately characterized as “reason to think that superintelligences are hard to box”. For instance, imagine an experiment with no gatekeeper or AI party at all, where the result of the experiment depends on flipping a coin to decide whether the AI gets out. That experiment is very different from a realistic AI scenario, but one need not have a reason to believe that intelligences are hard to box—or even hold any opinion at all on whether intelligences are hard to box—to object to the experimental design.
For the AI box experiment as stated, one of the biggest flaws is that the gatekeeper is required to stay engaged with the AI and can’t ignore it. This allows the AI to win by either verbally abusing the gatekeeper to the extent that he doesn’t want to stay around any more, or by overwhelming the gatekeeper with lengthy arguments that take time or outside assistance to analyze. These situations would not be a win for an actual AI in a box.
Refusing to release the transcripts causes other problems than just hiding fakery. If the experiment is flawed in some way, for instance, it could hide that—and it would be foolish to demand that everyone name possible flaws one by one and ask you “does this have flaw A?”, “does this have flaw B?”, etc. in order to determine whether the experiment has any flaws. There are also cases where whether something is a flaw is an opinion that can be argued, and it might be that someone else would consider a flaw something that the experimenter doesn’t.
Besides, in a real boxed AI situation, it’s likely that gatekeepers will be tested on AI-box experiments and will be given transcripts of experiment sessions to better prepare them for the real AI. An experiment that simulates an AI boxing should likewise have participants be able to read other sessions.