Thanks for sharing your contrarian views, both with this post and with your previous posts. Part of me is disappointed that you didn’t write more… it feels like you have several posts’ worth of objections to Less Wrong here, and at times you are just vaguely gesturing towards a larger body of objections you have towards some popular LW position. I wouldn’t mind seeing those objections fleshed out in to long, well-researched posts. Of course you aren’t obliged to put in the time & effort to write more posts, but it might be worth your time to fix specific flaws you see in the LW community given that it consists of many smart people interested in maximizing their positive impact on the far future.
I’ll preface this by stating some points of general agreement:
I haven’t bothered to read the quantum physics sequence (I figure if I want to take the time to learn that topic, I’ll learn from someone who researches it full-time).
I’m annoyed by the fact that the sequences in practice seem to constitute a relatively static document that doesn’t get updated in response to critiques people have written up. I think it’s worth reading them with a grain of salt for that reason. (I’m also annoyed by the fact that they are extremely wordy and mostly without citation. Given the choice of getting LWers to either read the sequences or read Thinking Fast and Slow, I would prefer they read the latter; it’s a fantastic book, and thoroughly backed up by citations. No intellectually serious person should go without reading it IMO, and it’s definitely a better return on time. Caveat: I personally haven’t read the sequences through and through, although I’ve read lots of individual posts, some of which were quite insightful. Also, there is surprisingly little overlap between the two works and it’s likely worthwhile to read both.)
And here are some points of disagreement :P
You talk about how Less Wrong encourages the mistake of reasoning by analogy. I searched for “site:lesswrong.com reasoning by analogy” on Google and came up with these 4 posts: 1, 2, 3, 4. Posts 1, 2, and 4 argue against reasoning by analogy, while post 3 claims the situation is a bit more nuanced. In this comment here, I argue that reasoning by analogy is a bit like taking the outside view: analogous phenomena can be considered part of the same (weak) reference class. So...
Insofar as there is an explicit “LW consensus” about whether reasoning by analogy is a good idea, it seems like you’ve diagnosed it incorrectly (although maybe there are implicit cultural norms that go against professed best practices).
It seems useful to know the answer to questions like “how valuable are analogies”, and the discussions I linked to above seem like discussions that might help you answer that question. These discussions are on LW.
Finally, it seems you’ve been unable to escape a certain amount of reasoning by analogy in your post. You state that experimental investigation of asteroid impacts was useful, so by analogy, experimental investigation of AI risks should be useful.
The steelman of this argument would be something like “experimentally, we find that investigators who take experimental approaches tend to do better than those who take theoretical approaches”. But first, this isn’t obviously true… mathematicians, for instance, have found theoretical approaches to be more powerful. (I’d guess that the developer of Bitcoin took a theoretical rather than an empirical approach to creating a secure cryptocurrency.) And second, I’d say that even this argument is analogy-like in its structure, since the reference class of “people investigating things” seems sufficiently weak to start pushing in to analogy territory. See my above point about how reasoning by analogy at its best is reasoning from a weak reference class. (Do people think this is worth a toplevel post?)
This brings me to what I think is my most fundamental point of disagreement with you. Viewed from a distance, your argument goes something like “Philosophy is a waste of time! Resolve your disagreements experimentally! There’s no need for all this theorizing!” And my rejoinder would be: Resolving disagreements experimentally is great… when it’s possible. We’d love to do a randomized controlled trial of whether universes with a Machine Intelligence Research Institute are more likely to have a positive singularity, but that unfortunately we don’t currently know how to do that.
There are a few issues with too much emphasis of experimentation over theory. The first issue is that you may be tempted to prefer experimentation over theory even for problems that theory is better suited for (e.g. empirically testing prime number conjectures). The second issue is that you may fall prey to the streetlight effect and prioritize areas of investigation that look tractable from an experimental point of view, ignoring questions that are both very important and not very tractable experimentally.
You write:
Well, much of our uncertainty about the actions of an unfriendly AI could be resolved if we were to know more about how such agents construct their thought models, and relatedly what language were used to construct their goal systems.
This would seem to depend on the specifics of the agent in question. This seems like a potentially interesting line of inquiry. My impression is that MIRI thinks most possible AGI architectures wouldn’t meet its standards for safety, so given that their ideal architecture is so safety-constrained, they’re focused on developing the safety stuff first before working on constructing thought models etc. This seems like a pretty reasonable approach for an organization with limited resources, if it is in fact MIRI’s approach. But I could believe that value could be added by looking at lots of budding AGI architectures and trying to figure out how one might make them safer on the margin.
We could also stand to benefit from knowing more practical information (experimental data) about in what ways AI boxing works and in what ways it does not, and how much that is dependent on the structure of the AI itself.
Sure… but note that Eliezer Yudkowsky from MIRI was the one who invented the AI box experiment and ran the first few experiments, and FHI wrote this paper consisting of a bunch of ideas for what AI boxes consist of. (The other thing I didn’t mention as a weakness of empiricism is that empiricism doesn’t tell you what hypotheses might be useful to test. Knowing what hypotheses to test is especially nice to know when testing hypotheses is expensive.)
I could believe that there are fruitful lines of experimental inquiry that are neglected in the AI safety space. Overall it looks kinda like crypto to me in the sense that theoretical investigation seems more likely to pan out. But I’m supportive of people thinking hard about specific useful experiments that someone could run. (You could survey all the claims in Bostrom’s Superintelligence and try to estimate what fraction could be cheaply tested experimentally. Remember that just because a claim can’t be tested experimentally doesn’t mean it’s not an important claim worth thinking about...)
it seems you’ve been unable to escape a certain amount of reasoning by analogy in your post. You state that experimental investigation of asteroid impacts was useful, so by analogy, experimental investigation of AI risks should be useful.
It seems I should have picked a different phrase to convey my intended target of ire. The problem isn’t concept formation by means of comparing similar reference classes, but rather using thought experiments as evidence and updating on them.
To be sure, thought experiments are useful for noticing when you are confused. They can also be semi-dark art in providing intuition pumps. Einstein did well in introducing special relativity by means of a series of thought experiments, by getting the reader to notice their confusion over classical electromagnetism in moving reference frames, then providing an intuition pump for how his own relativity worked in contrast. It makes his paper one of the most beautiful works in all of physics. However it was the experimental evidence which proved Einstein right, not the gedankenexperimenten.
If a thought experiment shows something to not feel right, that should raise your uncertainty about whether your model of what is going on is correct or not (notice your confusion), to whit the correct response should be “how can I test my beliefs here?” Do NOT update on thought experiments, as thought experiments are not evidence. The thought experiment triggers an actual experiment—even if that experiment is simply looking up data that is already collected—and the actual experimental results is what updates beliefs.
My impression is that MIRI thinks most possible AGI architectures wouldn’t meet its standards for safety.
MIRI has not to my knowledge released any review of existing AGI architectures. If that is their belief, the onus is on them to support it.
but note that Eliezer Yudkowsky from MIRI was the one who invented the AI box experiment
He invented the AI box game. If it’s an experiment, I don’t know what it is testing. It is a setup totatly divorced from any sane reality for how AGI might actually develop and what sort of controls might be in place, with built-in rules that favor the AI.
Yet nevertheless, time and time again people such as yourself point me to the AI box games as if it demonstrated anything of note, anything which should cause me to update my beliefs.
It is, I think, the examples of the sequences and the character of many of the philosophical discussions which happen here that drive people to feel justified in making such ungrounded inferences. And it is that tendency which possibly makes the sequences and/or less wrong a memetic hazard.
If a thought experiment shows something to not feel right, that should raise your uncertainty about whether your model of what is going on is correct or not (notice your confusion), to whit the correct response should be “how can I test my beliefs here?”
I have such very strong agreement with you here.
The problem isn’t concept formation by means of comparing similar reference classes, but rather using thought experiments as evidence and updating on them.
…but I disagree with you here.
Thought experiments and reasoning by analogy and the like are ways to explore hypothesis space. Elevating hypotheses for consideration is updating. Someone with excellent Bayesian calibration would update much much less on thought experiments etc. than on empirical tests, but you run into really serious problems of reasoning if you pretend that the type of updating is fundamentally different in the two cases.
I want to emphasize that I think you’re highlighting a strength this community would do well to honor and internalize. I strongly agree with a core point I see you making.
But I think you might be condemning screwdrivers because you’ve noticed that hammers are really super-important.
Hmm. Maybe. It depends on what you mean by “likelihood”, and by “selecting”.
Trivially, noticing a hypothesis and that it’s likely enough to justify being tested absolutely is making it subjectively more likely than it was before. I consider that tautological.
If someone is looking at n hypotheses and then decided to pick the kth one to test (maybe at random, or maybe because they all need to be tested at some point so why not start with the kth one), then I quite agree, that doesn’t change the likelihood of hypothesis #k.
But in my mind, it’s vividly clear that the process of plucking a likely hypothesis out of hypothesis space depends critically on moving probability mass around in said space. Any process that doesn’t do that is literally picking a hypothesis at random. (Frankly, I’m not sure a human mind even can do that.)
The core problem here is that most default human ways of moving probability mass around in hypothesis space (e.g. clever arguments) violate the laws of probability, whereas empirical tests aren’t nearly as prone to that.
So, if you mean to suggest that figuring out which hypothesis is worthy of testing does not involve altering our subjective likelihood that said hypothesis will turn out to be true, then I quite strongly disagree.
But if you mean that clever arguments can’t change what’s true even by a little bit, then of course I agree with you.
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
There’s a difference? Probability is probability.
So, if you mean to suggest that figuring out which hypothesis is worthy of testing does not involve altering our subjective likelihood that said hypothesis will turn out to be true, then I quite strongly disagree.
But if you mean that clever arguments can’t change what’s true even by a little bit, then of course I agree with you.
If you go about selecting a hypothesis by evaluating a space of hypotheses to see how they rate against your model of the world (whether you think they are true) and against each other (how much you stand to learn by testing them), you are essentially coming to reflective equilibrium regarding these hypothesis and your current beliefs. What I’m saying is that this shouldn’t change your actual beliefs—it will flush out some stale caching, or at best identify an inconsistent belief, including empirical data that you haven’t fully updated on. But it does not, by itself, constitute evidence.
So a clever argument might reveal an inconsistency in your priors, which in turn might make you want seek out new evidence. But the argument itself is insufficient for drawing conclusions. Even if the hypothesis is itself hard to test.
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
There’s a difference? Probability is probability.
There very much is a difference.
Probability is a mathematical construct. Specifically, it’s a special kind of measurep on a measure space M such that p(M) = 1 and p obeys a set of axioms that we refer to as the axioms of probability (where an “event” from the Wikipedia page is to be taken as any measurable subset of M).
This is a bit like highlighting that Euclidean geometry is a mathematical construct based on following thus-and-such axioms for relating thus-and-such undefined terms. Of course, in normal ways of thinking we point at lines and dots and so on, pretend those are the things that the undefined terms refer to, and proceed to show pictures of what the axioms imply. Formally, mathematicians refer to this as building a model of an axiomatic system. (Another example of this is elliptic geometry, which is a type of non-Euclidean geometry, which you can model as doing geometry on a sphere.)
The Frequentist and Bayesian models of probability theory are relevantly different. They both think of M as the space of possible results (usually called the “sample space” but not always) and a measurable subset E ≤ M as an “event”. But they use different models of p:
Frequentists suggest that were you to look at how often all of the events in M occur, the one we’re looking at (i.e., E) would occur at a certain frequency, and that’s how we should interpret p(E). E.g., if M is the set of results from flipping a fair coin and E is “heads”, then it is a property of the setup that p(E) = 0.5. A different way of saying this is that Frequentists model p as describing a property of that which they are observing—i.e., that probability is a property of the world.
Bayesians, on the other hand, model p as describing their current state of confidence about the true state of the observed phenomenon. In other words, Bayesians model p as being a property of mental models, not of the world. So if M is again the results from flipping a fair coin and E is “heads”, then to a Bayesian the statement p(E) = 0.5 is equivalent to saying “I equally expect getting a heads to not getting a heads from this coin flip.” To a Bayesian, it doesn’t make sense to ask what the “true” probability is that their subjective probability is estimating; the very question violates the model of p by trying to sneak in a Frequentist presumption.
Now let’s suppose that M is a hypothesis space, including some sector for hypotheses that haven’t yet been considered. When we say that a given hypothesis H is “likely”, we’re working within a partial model, but we haven’t yet said what “likely” means. The formalism is easy: we require that H ≤ M is measurable, and the statement that “it’s likely” means that p(H) is larger than most other measurable subsets of M (and often we mean something stronger, like p(H) > 0.5). But we haven’t yet specified in our model what p(H) means. This is where the difference between Frequentism and Bayesianism matters. A Frequentist would say that the probability is a property of the hypothesis space, and noticing H doesn’t change that. (I’m honestly not sure how a Frequentist thinks about iterating over a hypothesis space to suggest that H in fact would occur at a frequency of p(H) in the limit—maybe by considering the frequency in counterfactual worlds?) A Bayesian, by contrast, will say that p(H) is their current confidence that H is the right hypothesis.
What I’m suggesting, in essence, is that figuring out which hypothesis H ≤ M is worth testing is equivalent to moving from p to p’ in the space of probability measures on M in a way that causes p’(H) > p(H). This is coming from using a Bayesian model of what p is.
Of course, if you’re using a Frequentist model of p, then “most likely hypothesis” actually refers to a property of the hypothesis space—though I’m not sure how you would find out the frequency at which hypotheses turn out to be true the way you figure out the frequency at which a coin comes up heads. But that could just be my not being as familiar thinking in terms of the Frequentist model.
I’ll briefly note that although I find the Bayesian model more coherent with my sense of how the world works on a day-by-day basis, I think the Frequentist model makes more sense when thinking about quantum physics. The type of randomness we find there isn’t just about confidence, but is in fact a property of the quantum phenomena in question. In this case a well-calibrated Bayesian has to give a lot of probability mass to the hypothesis that there is a “true probability” in some quantum phenomena, which makes sense if we switch the model of p to be Frequentist.
But in short:
Yes, there’s a difference.
And things like “probability” and “belief” and “evidence” mean different things depending on what model you use.
What I’m saying is that this shouldn’t change your actual beliefs—it will flush out some stale caching, or at best identify an inconsistent belief, including empirical data that you haven’t fully updated on. But it does not, by itself, constitute evidence.
Yep, we disagree.
I think the disagreement is on two fronts. One is based on using different models of probability, which is basically not an interesting disagreement. (Arguing over which definition to use isn’t going to make either of us smarter.) But I think the other is substantive. I’ll focus on that.
In short, I think you underestimate the power of noticing implications of known facts. I think that if you look at a few common or well-known examples of incomplete deduction, it becomes pretty clear that figuring out how to finish thinking would be intensely powerful:
Many people make resolutions to exercise, be nicer, eat more vegetables, etc. And while making those resolutions, they often really think they mean it this time. And yet, there’s often a voice of doubt in the back of the mind, as though saying “Come on. You know this won’t work.” But people still quite often spend a bunch of time and money trying to follow through on their new resolution—often failing for reasons that they kind of already knew would happen (and yet often feeling guilty for not sticking to their plan!).
Religious or ideological deconversion often comes from letting in facts that are already known. E.g., I used to believe that the results of parapsychological research suggested some really important things about how to survive after physical death. I knew all the pieces of info that finally changed my mind months before my mind actually changed. I had even done experiments to test my hypotheses and it still took months. I’m under the impression that this is normal.
Most people reading this already know that if they put a ton of work into emptying their email inbox, they’ll feel good for a little while, and then it’ll fill up again, complete with the sense of guilt for not keeping up with it. And yet, somehow, it always feels like the right thing to do to go on an inbox-emptying flurry, and then get around to addressing the root cause “later” or maybe try things that will fail after a month or two. This is an agonizingly predictable cycle. (Of course, this isn’t how it goes for everyone, but it’s common enough that well over half the people who attend CFAR workshops seem to relate to it.)
Most of Einstein’s work in raising special relativity to consideration consisted of saying “Let’s take the Michelson-Morley result at face value and see where it goes.” Note that he is now considered the archetypal example of a brilliant person primarily for his ability to highlight worthy hypotheses via running with the implications of what is already known or supposed.
Ignaz Semmelweis found that hand-washing dramatically reduced mortality in important cases in hospitals. He was ignored, criticized, and committed to an insane asylum where guards beat him to death. At a cultural level, the fact that whether Semmelweis was right was (a) testable and (b) independent of opinion failed to propagate until after Louis Pasteur gave the medical community justification to believe that hand-washing could matter. This is a horrendous embarrassment, and thousands of people died unnecessarily because of a cultural inability to finish thinking. (Note that this also honors the need for empiricism—but the point here is that the ability to finish thinking was a prerequisite for empiricism mattering in this case.)
I could keep going. Hopefully you could too.
But my point is this:
Please note that there’s a baby in that bathwater you’re condemning as dirty.
Those are not different models. They are different interpretations of the utility of probability in different classes of applications.
though I’m not sure how you would find out the frequency at which hypotheses turn out to be true the way you figure out the frequency at which a coin comes up heads. But that could just be my not being as familiar thinking in terms of the Frequentist model
You do it exactly the same as in your Bayesian example.
I’m sorry, but this Bayesian vs Frequentist conflict is for the most part non-existent. If you use probability to model the outcome of an inherently random event, people have called that “frequentist.” If instead you model the event as deterministic, but your knowledge over the outcome as uncertain, then people have applied the label “bayesian.” It’s the same probability, just used differently.
It’s like how if you apply your knowledge of mechanics to bridge and road building, it’s called civil engineering, but if you apply it to buildings it is architecture. It’s still mechanical engineering either way, just applied differently.
One of the failings of the sequences is the amount of emphasis that is placed on “Frequentist” vs “Bayesian” interpretations. The conflict between the two exists mostly in Yudkowsky’s mind. Actual statisticians use probability to model events and knowledge of events simultaneously.
Regarding the other points, every single example you gave involves using empirical data that had not sufficiently propagated, which is exactly the sort of use I am in favor of. So I don’t know what it is that you disagree with.
I’m sorry, but this Bayesian vs Frequentist conflict is for the most part non-existent.
[…]
One of the failings of the sequences is the amount of emphasis that is placed on “Frequentist” vs “Bayesian” interpretations. The conflict between the two exists mostly in Yudkowsky’s mind. Actual statisticians use probability to model events and knowledge of events simultaneously.
I know a fellow who has a Ph.D. in statistics and works for the Department of Defense on cryptography. I think he largely agrees with your point: professional statisticians need to use both methods fluidly in order to do useful work. But he also doesn’t claim that they’re both secretly the same thing. He says that strong Bayesianism is useless in some cases that Frequentism gets right, and vice versa, though his sympathies lie more with the Frequentist position on pragmatic grounds (i.e. that methods that are easier to understand in a Frequentist framing tend to be more useful in a wider range of circumstances in his experience).
I think the debate is silly. It’s like debating which model of hyperbolic geometry is “right”. Different models highlight different intuitions about the formal system, and they make different aspects of the formal theorems more or less relevant to specific cases.
I think Eliezer’s claim is that as a matter of psychology, using a Bayesian model of probability lets you think about the results of probability theory as laws of thought, and from that you can derive some useful results about how one ought to think and what results from experimental psychology ought to capture one’s attention. He might also be claiming somewhere that Frequentism is in fact inconsistent and therefore is simply a wrong model to adopt, but honestly if he’s arguing that then I’m inclined to ignore him because people who know a lot more about Frequentism than he does don’t seem to agree.
But there is a debate, even if I think it’s silly and quite pointless.
And also, the axiomatic models are different, even if statisticians use both.
Regarding the other points, every single example you gave involves using empirical data that had not sufficiently propagated, which is exactly the sort of use I am in favor of. So I don’t know what it is that you disagree with.
The concern about AI risk is also the result of an attempt to propagate implications of empirical data. It just goes farther than what I think you consider sensible, and I think you’re encouraging an unnecessary limitation on human reasoning power by calling such reasoning unjustified.
I agree, it should itch that there haven’t been empirical tests of several of the key ideas involved in AI risk, and I think there should be a visceral sense of making bullshit up attached to this speculation unless and until we can find ways to do those empirical tests.
But I think it’s the same kind of stupid to ignore these projections as it is to ignore that you already know how your New Year’s Resolution isn’t going to work. It’s not obviously as strong a stupidity, but the flavor is exactly the same.
If we could banish that taste from our minds, then even without better empiricism we would be vastly stronger.
I’m concerned that you’re underestimating the value of this strength, and viewing its pursuit as a memetic hazard.
I don’t think we have to choose between massively improving our ability to make correct clever arguments and massively improving the drive and cleverness with which we ask nature its opinion. I think we can have both, and I think that getting AI risk and things like it right requires both.
But just as measuring everything about yourself isn’t really a fully mature expression of empiricism, I’m concerned about the memes you’re spreading in the name of mature empiricism retarding the art of finishing thinking.
I don’t think that they have to oppose.
And I’m under the impression that you think otherwise.
But the argument itself is insufficient for drawing conclusions.
This seems like it would be true only if you’d already propagated all logical consequences of all observations you’ve made. But an argument can help me to propagate. Which means it can make me update my beliefs.
For example, is 3339799 a prime number?
One ought to assign some prior probability to it being a prime. A naive estimate might say, well, there are two options, so let’s assign it 50% probability.
You could also make a more sophisticated argument about the distribution of prime numbers spreading out as you go towards infinity, and given that only 25 of the first 100 numbers are prime, the chance that a randomly selected number in the millions should be prime is less than 25% and probably much lower.
I claim that in a case like this it is totally valid to update your beliefs on the basis of an argument. No additional empirical test required before updating.
I think the definition of ‘experiment’ gets tricky and confusing when you are talking about math specifically. When you talk about finding the distribution of prime numbers and using that to arrive at a more accurate model for your prior probability of 3339799 being prime, that is an experiment.
Math is unique in that regard though. For questions about the real world we must seek evidence that is outside of our heads.
[...] this shouldn’t change your actual beliefs [...] it does not, by itself, constitute evidence [...] the argument itself is insufficient for drawing conclusions. Even if the hypothesis is itself hard to test.
Is that a conclusion or a hypothesis? I don’t believe there is a fundamental distinction between “actual beliefs”, “conclusions” and “hypotheses”. What should it take to change my beliefs about this?
I’ll think about how this can be phrased differently such that it might sway you. Given that you are not Valentine, is there a difference of opinion between his posts above and your views?
That part you pulled out and quoted is essentially what I was writing about in the OP. There is a philosophy-over-hard-subjects which is pursued here, in the sequences, at FHI, and is exemplified in the conclusions drawn by Bostrom in Superintelligence, and Yudkowsky in the later sequences. Sometimes it works, e.g. the argument in the sequences about the compatibility of determinism and free will works because it essentially shows how non-determinism and free will are incompatible—it exposes a cached thought that free-will == non-deterministic choice which was never grounded in the first place. But over new subjects where you are not confused in the first place—e.g. the nature and risk of superintelligence—people seem to be using thought experiments alone to reach ungrounded conclusions, and not following up with empirical studies.
That is dangerous. If you allow yourself to reason from thought experiments alone, I can get you to believe almost anything. I can’t get you to believe the sky is green—unless you’ve never seen the sky—but anything you yourself don’t have available experimental evidence for or against, I can sway you in either way. E.g. that consciousness is in information being computed and not the computational process itself. That an AI takeoff would be hard, not soft, and basically uncontrollable. That boxing techniques are foredoomed to failure irregardless of circumstances. That intelligence and values are orthogonal under all circumstances. That cryonics is an open-and-shut case. On these sorts of questions we need more, not less experimentation.
When you hear a clever thought experiment that seems to demonstrate the truth of something you previously thought to have low probability, then (1) check if your priors here are inconsistent with each other; then (2) check if there is empirical data here that you have not fully updated on. If neither of those approaches resolves the issue, then (3) notice you are confused, and seek an experimental result to resolve the confusion. If you are truly unable to find an experimental test you can perform now, then (4) operate as if you do not know which of the possible theories is true.
You do not say “that thought experiment seemed convincing, so until I know otherwise I’ll update in favor of it.” That is the sort of thinking which led the ancients to believe that “All things come to rest eventually, so the natural state is a lack of motion. Planets continue in clockwork motion, so they must be a separate magisteria from earthly objects.” You may think we as rationalists are above that mistake, but history has shown otherwise. Hindsight bias makes the Greeks seem a lot stupider than they actually were.
Take a concrete example: the physical origin of consciousness. We can rule out the naïve my-atoms-constitute-my-consciousness view from biological arguments. However I have been unable to find or construct for myself an experiment which would definitively rule out either the information-identity or computational-process theories, both of which are supported by available empirical evidence.
How is this relevant? Some are arguing for brain preservation instead of cryonics. But this only achieves personal longevity if the information-identity theory is correct as it is destructive of the computational process. Cryonics on the other hand achieves personal longevity by preserving the computational substrate itself, which achieves both information- and computational-preservation. So unless there is a much larger difference in success likelihood than appears to be the case, my money (and my life) is on cryonics. Not because I think that computational-process theory is correct (although I do have other weak evidence that makes it more likely), but because I can’t rule it out as a possibility so I must consider the case where destructive brain preservation gets popularized but at the cost of fewer cryopreservations, and it turns out that personal longevity is only achieved with the preservation of computational processes. So I do not support the Brain Preservation Foundation.
To be clear, I think that arguing for destructive brain preservation at this point in time is a morally unconscionable thing to do, even though (exactly because!) we don’t know the nature of consciousness and personal identity, and there is an alternative which is likely to work no matter how that problem is resolved.
My point is that the very statements you are making, that we are all making all the time, are also very theory-loaded, “not followed up with empirical studies”. This includes the statements about the need to follow things up with empirical studies. You can’t escape the need for experimentally unverified theoretical judgement, and it does seem to work, even though I can’t give you a well-designed experimental verification of that. Some well-designed studies even prove that ghosts exist.
The degree to which discussion of familiar topics is closer to observations than discussion of more theoretical topics is unclear, and the distinction should be cashed out as uncertainty on a case-by-case basis. Some very theoretical things are crystal clear math, more certain than the measurement of the charge of an electron.
That is dangerous.
Being wrong is dangerous. Not taking theoretical arguments into account can result in error. This statement probably wouldn’t be much affected by further experimental verification. What specifically should be concluded depends on the problem, not on a vague outside measure of the problem like the degree to which it’s removed from empirical study.
[...] anything you yourself don’t have available experimental evidence for or against, I can sway you in either way. E.g. that consciousness is in information being computed and not the computational process itself.
Before considering the truth of a statement, we should first establish its meaning, which describes the conditions for judging its truth. For a vague idea, there are many alternative formulations of its meaning, and it may be unclear which one is interesting, but that’s separate from the issue of thinking about any specific formulation clearly.
Ghosts specifically seem like too complicated a hypothesis to extract from any experimental results I’m aware of. If we didn’t already have a concept of ghosts, I doubt any parapsychology experiments that have taken place would have caused us to develop one.
People select hypotheses for testing because they have previously weakly updated in the direction of them being true. Seeing empirical data produces a later, stronger update.
Except that when the hypothesis space is large, people test hypotheses because they strongly updated in the direction of them being true, and seeing empirical data produces a later, weaker update. Where an example of ‘strongly updating’ could be going from 9,999,999:1 odds against a hypothesis to 99:1 odds, and an example of ‘weakly updating’ could be going from 99:1 odds against the hypothesis to 1:99. The former update requires about 20 bits of evidence, while the latter update requires about 10 bits of evidence.
Interesting point. I guess my intuitive notion of a “strong update” has to do with absolute probability mass allocation rather than bits of evidence (probability mass is what affects behavior?), but that’s probably not a disagreement worth hashing out.
Thanks! Paul Graham is my hero when it comes to writing and I try to pack ideas as tightly as possible. (I recently reread this essay of his and got amazed by how many ideas it contains; I think it has more intellectual content than most published nonfiction books, in just 10 pages or so. I guess the downside of this style is that readers may not go slow enough to fully absorb all the ideas. Anyway, I’m convinced that Paul Graham is the Ben Franklin of our era.)
I have a feeling your true rejection runs deeper than you’re describing. You cite a thought experiment of Einstein’s as being useful and correct. You explain that Less Wrong relies on thought experiments too heavily. You suggest that Less Wrong should lean heavier on data from the real world. But the single data point you cite on the question of thought experiments indicates that they are useful and correct. It seems like your argument fails by its own standard.
I think the reliability of thought experiments is a tricky question to resolve. I think we might as well expand the category of thought experiments to “any reasoning about the world that isn’t reasoning directly from data”. When I think about the reliability of this reasoning, my immediate thought is that I expect some people to be much better at it than others. In fact, I think being good at this sort of reasoning is almost exactly the same as being intelligent. Reasoning directly from data is like looking up the answers in the back of the book.
This leaves us with two broad positions: the “humans are dumb/the world is tricky” position that the only way we can ever get anywhere is through constant experimentation, and the “humans are smart/the world is understandable” position that we can usefully make predictions based on limited data.
I think these positions are too broad to be useful. It depends a lot on the humans, and it depends a lot on the aspect of the world being studied. Reasoning from first principles works better in physics than in medicine; in that sense, medicine is a trickier subject to study.
If the tricky world hypothesis is true for the questions MIRI is investigating, or the MIRI team is too dumb, I could see the sort of empirical investigations you propose as being the right approach: they don’t really answer the most important questions we want answered, but there probably isn’t a way for MIRI to answer those questions anyway, so might as well answer the questions that are answerable and see if the results lead anywhere.
Anyway, I think a lot of the value of many LW posts is in finding useful ideas that are also very general. (Paul Graham’s description of what philosophy done right looks like.) Very general ideas are harder to test, because they cut across domains. The reason I like the many citations in Thinking Fast and Slow is that I expect the general ideas it presents to be more reliably true because they’re informed by at least some experimental data. But general, useful ideas can be so useful that I don’t mind taking the time to read about them even if they’re not informed by lots of experimental data. Specifically, having lots of general, useful ideas that are also correct (e.g. knowing when and how to add numbers) makes you more intelligent according to my definition above. And I consider myself intelligent enough to be able to tell apart the true general, useful ideas from the false ones through my own reasoning and experience at least somewhat reliably.
Broadly speaking, I think Less Wrong is counterproductive if specific general, useful ideas it promotes are false. (It’s hard to imagine how it could be counterproductive if they were true.) And at that point we’re talking about whether specific posts are true or false. Lukeprog has this list of points of agreement between the sequences and mainstream academia, which causes me to update in the direction of those points of agreement being true.
I think you’re being overly hard on the AI box experiment. It’s obviously testing something. It’d be great if we could fork the universe, import a superintelligence, set up a bunch of realistic safeguards, and empirically determine how things played out. But that’s not practical. We did manage to find an experiment that might shed some light on the scenario, but the experiment uses a smart human instead of a superintelligence and a single gatekeeper instead of a more elaborate set of controls. It seems to me that you aren’t following your own standard: you preach the value of empiricism and then throw out some of the only data points available, for theoretical reasons, without producing any better data. I agree that it’s some pretty weak data, but it seems better to think about it than throw it out and just believe whatever we like, and I think weak data is about as well as you’re going to do in this domain.
You cite a thought experiment of Einstein’s as being useful and correct.
I cite a thought experiment of Einstein’s as being useful but insufficient. It was not correct until observation matched anticipation. I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth. Do you see the difference?
I think you’re being overly hard on the AI box experiment. It’s obviously testing something.
No, this is not obvious to me. Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth.
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth”? It’s obvious the method has to work to some extent, because Einstein couldn’t have arrived at a correct view by chance. Is your view that Einstein should have updated less from whatever reasoning process he used to pick out that hypothesis from the space of hypotheses, than from the earliest empirical tests of that hypothesis, contra Einstein’s Arrogance?
Or is your view that, while Einstein may technically have gone through a process like that, no one should assume they are in fact Einstein—i.e., Einstein’s capabilities are so rare, or his methods are so unreliable (not literally at the level of chance, but, say, at the level of 1000-to-1 odds of working), that by default you should harshly discount any felt sense that your untested hypothesis is already extremely well-supported?
Or perhaps you should harshly discount it until you have meta-evidence, in the form of a track record of successfully predicting which untested hypotheses will turn out to be correct.
Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
The AI box experiment is a response to the claim ‘superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box’. It functions as an existence proof; if a human level of social competence is already sufficient to talk one’s way out of a box with nonzero frequency, then we can’t dismiss risk from superhuman levels of social competence.
If you think the claim Eliezer was responding to is silly on priors, or just not relevant (because it would be easy to assess an AI’s social competence and/or prevent it from gaining such competence), then you won’t be interested in that part of the conversation.
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
I mentioned that possibility above. But Einstein couldn’t have been merely lucky—even if it weren’t the case that he was able to succeed repeatedly, his very first success was too improbable for him to have just plucking random physical theories out of a hat. Einstein was not a random number generator, so there was some kind of useful cognitive work going on.
That leaves open the possibility that it was only useful enough to give Einstein a 1% chance of actually being right; but still, I’m curious about whether you do think he only had a 1% chance of being right, or (if not) what rough order of magnitude you’d estimate. And I’d likewise like to know what method he used to even reach a 1% probability of success (or 10%, or 0.1%), and why we should or shouldn’t think this method could be useful elsewhere.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
Can you define “confirmation” for me, in terms of probability theory?
Big Al may well have had some intuitive mojo that enabled him to pick the right thought experiments , but that still doesn’t make thought experiments a substitute for real empiricism. And intuitive mojo, isnt a method in the sense of vbeing reproducible.
Can you define “confirmation” for me, in terms of probability theory?
Why not derive probability theory in terms of confirmation.?
Thought experiments aren’t a replacement for real empiricism. They’re a prerequisite for real empiricism.
“Intuitive mojo” is just calling a methodology you don’t understand a mean name. However Einstein repeatedly hit success in his lifetime, presupposing that it is an ineffable mystery or a grand coincidence won’t tell us much.
Why not derive probability theory in terms of confirmation.?
I already understand probability theory, and why it’s important. I don’t understand what you mean by “confirmation,” how your earlier statement can be made sense of in quantitative terms, or why this notion should be treated as important here. So I’m asking you to explain the less clear term in terms of the more clear term.
Actually he did not. He got lucky early in his career, and pretty much coasted on that into irrelevance. His intuition allowed him to solve problems related to relativity, the photoelectric effect, Brownian motion, and a few other significant contributions within the span of a decade, early in his career. And then he went off the deep end following his intuition down a number of dead-ending rabbit holes for the rest of his life. He died in Princeton in 1955 having made no further significant contributions to physics after is 1916 invention of general relativity. Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
...huh? Correct me if I’m wrong here, but Einstein was a great physicist who made lots of great discoveries, right?
The right cautionary tale would be to cite physicists who attempted to follow the same strategy Einstein did and see how it mostly only worked for Einstein. But if Einstein was indeed a great physicist, it seems like at worst his strategy is one that doesn’t usually produce results but sometimes produces spectacular results… which doesn’t seem like a terrible strategy.
I have a very strong (empirical!) heuristic that the first thing people should do if they’re trying to be good at something is copy winners. Yes there are issues like regression to the mean and stuff, but it provides a good alternative perspective vs thinking things through from first principles (which seems to be my default cognitive strategy).
The thing is Einstein was popular, but his batting average was less than his peers. In terms of advancing the state of the art, the 20th century is full of theoretical physicists that have a better track record for pushing the state of the art forward than Einstein, most of whom did not spend the majority of their career chasing rabbits down holes. They may not be common household names, but honestly that might have more to do with the hair than his physics.
I should point out that I heard this cautionary tale as “don’t set your sights too high,” not “don’t employ the methods Einstein employed.” The methods were fine, the trouble was that he was at IAS and looking for something bigger than his previous work, rather than planting acorns that would grow into mighty oaks (as Hamming puts it).
The AI box experiment only serves even as that if you assume that the AI box experiment sufficiently replicates the conditions that would actually be faced by someone with an AI in a box. Also, it only serves as such if it is otherwise a good experiment, but since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
Again, the AI box experiment is a response to the claim “superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box”. If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up. But this constitutes a change of topic, not an objection to the experiment.
since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up.
Not all relevant differences between an experiment and an actual AI scenario can be accurately characterized as “reason to think that superintelligences are hard to box”. For instance, imagine an experiment with no gatekeeper or AI party at all, where the result of the experiment depends on flipping a coin to decide whether the AI gets out. That experiment is very different from a realistic AI scenario, but one need not have a reason to believe that intelligences are hard to box—or even hold any opinion at all on whether intelligences are hard to box—to object to the experimental design.
For the AI box experiment as stated, one of the biggest flaws is that the gatekeeper is required to stay engaged with the AI and can’t ignore it. This allows the AI to win by either verbally abusing the gatekeeper to the extent that he doesn’t want to stay around any more, or by overwhelming the gatekeeper with lengthy arguments that take time or outside assistance to analyze. These situations would not be a win for an actual AI in a box.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
Refusing to release the transcripts causes other problems than just hiding fakery. If the experiment is flawed in some way, for instance, it could hide that—and it would be foolish to demand that everyone name possible flaws one by one and ask you “does this have flaw A?”, “does this have flaw B?”, etc. in order to determine whether the experiment has any flaws. There are also cases where whether something is a flaw is an opinion that can be argued, and it might be that someone else would consider a flaw something that the experimenter doesn’t.
Besides, in a real boxed AI situation, it’s likely that gatekeepers will be tested on AI-box experiments and will be given transcripts of experiment sessions to better prepare them for the real AI. An experiment that simulates an AI boxing should likewise have participants be able to read other sessions.
BTW, I realized there’s something else I agree with you on that’s probably worth mentioning:
Eliezer in particular, I think, is indeed overconfident in his ability to reason things out from first principles. For example, I think he was overconfident in AI foom (see especially the bit at the end of that essay). And even if he’s calibrated his ability correctly, it’s totally possible that others who don’t have the intelligence/rationality he does could pick up the “confident reasoning from first principles” meme and it would be detrimental to them.
That said, he’s definitely a smart guy and I’d want to do more thinking and research before making a confident judgement. What I said is just my current estimate.
Insofar as I object to your post, I’m objecting to the idea that empiricism is the be-all and end-all of rationality tools. I’m inclined to think that philosophy (as described in Paul Graham’s essay) is useful and worth learning about and developing.
I’m annoyed by the fact that the sequences in practice seem to constitute a relatively static document that doesn’t get updated in response to critiques people have written up
For a start.… there’s also a lack of discernible point in a lot of places. But too much good stuff to justify rejecting the whole thing.
See my above point about how reasoning by analogy at its best is reasoning from a weak reference class. (Do people think this is worth a toplevel post?)
Yes, I do. Intuitively, this seems correct. But I’d still like to see you expound on the idea.
The steelman of this argument would be something like “experimentally, we find that investigators who take experimental approaches tend to do better than those who take theoretical approaches”. But first, this isn’t obviously true… mathematicians, for instance, have found theoretical approaches to be more powerful. (I’d guess that the developer of Bitcoin took a theoretical rather than an empirical approach to creating a secure cryptocurrency, for instance.)
This example actually proves the opposite. Bitcoin was described in a white paper that wasn’t very impressive by academic crypto standards—few if anyone became interested in Bitcoin from first reading the paper in the early days. It’s success was proven by experimentation, not pure theoretical investigation.
My impression is that MIRI thinks most possible AGI architectures wouldn’t meet its standards for safety, so given that their ideal architecture is so safety-constrained, they’re focused on developing the safety stuff first before working on constructing thought models etc. This seems like a pretty reasonable approach for an organization with limited resources, if it is in fact MIRI’s approach. But I could believe that value could be added by looking at lots of budding AGI architectures and trying to figure out how one might make them safer on the margin.
It’s hard to investigate safety if one doesn’t know the general shape that AGI will finally take. MIRI has focused on a narrow subset of AGI space—namely transparent math/logic based AGI. Unfortunately it is becoming increasingly clear that the Connectionists were more or less absolutely right in just about every respect . AGI will likely take the form of massive brain-like general purpose ANNs. Most of MIRI’s research thus doesn’t even apply to the most likely AGI candidate architecture.
if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
I’m guessing this is likely to be true of general-purpose ANNs, meaning recursive self-improvement would be more difficult for a brain-like ANN than it might be for some other sort of AI? (This would be somewhat reassuring if it was true.)
meaning recursive self-improvement would be more difficult for a brain-like ANN than it might be for some other sort of AI?
It’s not clear that there is any other route to AGI—all routes lead to “brain-like ANNs”, regardless of what linguistic label we use (graphical models, etc).
General purpose RL—in ideal/optimal theoretical form—already implements recursive self-improvement in the ideal way. If you have an ideal/optimal general RL system running, then there are no remaining insights you could possibly have which could further improve its own learning ability.
The evidence is accumulating that general Bayesian RL can be efficiently approximated, that real brains implement something like this, and that very powerful general purpose AI/AGI can be built on the same principles.
Now, I do realize that by “recursive self-improvement” you probably mean a human level AGI consciously improving its own ‘software design’, using slow rule based/logic thinking of the type suitable for linguistic communication. But there is no reason to suspect that the optimal computational form of self-improvement should actually be subject to those constraints.
The other, perhaps more charitable view of “recursive self-improvement” is the more general idea of the point in time where AGI engineers/researchers takeover most of the future AGI engineering/research work. Coming up with new learning algorithms will probably be only a small part of the improvement work at that point. Implementations however can always be improved, and there is essentially an infinite space of better hardware designs. Coming up with new model architectures and training environments will also have scope for improvement.
Also, it doesn’t really appear to matter much how many modules the AGI has, because improvement doesn’t rely much on human insights into how each module works. Even with zero new ‘theoerical’ insights, you can just run the AGI on better hardware and it will be able to think faster or split into more copies. Either way, it will be able to speed up the rate at which it soaks up knowledge and automatically rewires itself (self-improves).
This example actually proves the opposite. Bitcoin was described in a white paper that wasn’t very impressive by academic crypto standards—few if anyone became interested in Bitcoin from first reading the paper in the early days. It’s success was proven by experimentation, not pure theoretical investigation.
By experimentation, do you mean people running randomized controlled trials on Bitcoin or otherwise empirically testing hypotheses on the software? Just because your approach is collaborative and incremental doesn’t mean that it’s empirical.
By experimentation, do you mean people running randomized controlled trials on Bitcoin or otherwise empirically testing hypotheses on the software?
Not really—by experimentation I meant proving a concept by implementing it and then observing whether the implementation works or not, as contrasted to the pure math/theory approach where you attempt to prove something abstractly on paper.
For context, I was responding to your statement:
But first, this isn’t obviously true… mathematicians, for instance, have found theoretical approaches to be more powerful. (I’d guess that the developer of Bitcoin took a theoretical rather than an empirical approach to creating a secure cryptocurrency, for instance.)
Bitcoin is an example of typical technological development, which is driven largely by experimentation/engineering rather than math/theory. Theory is important mainly as a means to generate ideas for experimentation.
Thanks for sharing your contrarian views, both with this post and with your previous posts. Part of me is disappointed that you didn’t write more… it feels like you have several posts’ worth of objections to Less Wrong here, and at times you are just vaguely gesturing towards a larger body of objections you have towards some popular LW position. I wouldn’t mind seeing those objections fleshed out in to long, well-researched posts. Of course you aren’t obliged to put in the time & effort to write more posts, but it might be worth your time to fix specific flaws you see in the LW community given that it consists of many smart people interested in maximizing their positive impact on the far future.
I’ll preface this by stating some points of general agreement:
I haven’t bothered to read the quantum physics sequence (I figure if I want to take the time to learn that topic, I’ll learn from someone who researches it full-time).
I’m annoyed by the fact that the sequences in practice seem to constitute a relatively static document that doesn’t get updated in response to critiques people have written up. I think it’s worth reading them with a grain of salt for that reason. (I’m also annoyed by the fact that they are extremely wordy and mostly without citation. Given the choice of getting LWers to either read the sequences or read Thinking Fast and Slow, I would prefer they read the latter; it’s a fantastic book, and thoroughly backed up by citations. No intellectually serious person should go without reading it IMO, and it’s definitely a better return on time. Caveat: I personally haven’t read the sequences through and through, although I’ve read lots of individual posts, some of which were quite insightful. Also, there is surprisingly little overlap between the two works and it’s likely worthwhile to read both.)
And here are some points of disagreement :P
You talk about how Less Wrong encourages the mistake of reasoning by analogy. I searched for “site:lesswrong.com reasoning by analogy” on Google and came up with these 4 posts: 1, 2, 3, 4. Posts 1, 2, and 4 argue against reasoning by analogy, while post 3 claims the situation is a bit more nuanced. In this comment here, I argue that reasoning by analogy is a bit like taking the outside view: analogous phenomena can be considered part of the same (weak) reference class. So...
Insofar as there is an explicit “LW consensus” about whether reasoning by analogy is a good idea, it seems like you’ve diagnosed it incorrectly (although maybe there are implicit cultural norms that go against professed best practices).
It seems useful to know the answer to questions like “how valuable are analogies”, and the discussions I linked to above seem like discussions that might help you answer that question. These discussions are on LW.
Finally, it seems you’ve been unable to escape a certain amount of reasoning by analogy in your post. You state that experimental investigation of asteroid impacts was useful, so by analogy, experimental investigation of AI risks should be useful.
The steelman of this argument would be something like “experimentally, we find that investigators who take experimental approaches tend to do better than those who take theoretical approaches”. But first, this isn’t obviously true… mathematicians, for instance, have found theoretical approaches to be more powerful. (I’d guess that the developer of Bitcoin took a theoretical rather than an empirical approach to creating a secure cryptocurrency.) And second, I’d say that even this argument is analogy-like in its structure, since the reference class of “people investigating things” seems sufficiently weak to start pushing in to analogy territory. See my above point about how reasoning by analogy at its best is reasoning from a weak reference class. (Do people think this is worth a toplevel post?)
This brings me to what I think is my most fundamental point of disagreement with you. Viewed from a distance, your argument goes something like “Philosophy is a waste of time! Resolve your disagreements experimentally! There’s no need for all this theorizing!” And my rejoinder would be: Resolving disagreements experimentally is great… when it’s possible. We’d love to do a randomized controlled trial of whether universes with a Machine Intelligence Research Institute are more likely to have a positive singularity, but that unfortunately we don’t currently know how to do that.
There are a few issues with too much emphasis of experimentation over theory. The first issue is that you may be tempted to prefer experimentation over theory even for problems that theory is better suited for (e.g. empirically testing prime number conjectures). The second issue is that you may fall prey to the streetlight effect and prioritize areas of investigation that look tractable from an experimental point of view, ignoring questions that are both very important and not very tractable experimentally.
You write:
This would seem to depend on the specifics of the agent in question. This seems like a potentially interesting line of inquiry. My impression is that MIRI thinks most possible AGI architectures wouldn’t meet its standards for safety, so given that their ideal architecture is so safety-constrained, they’re focused on developing the safety stuff first before working on constructing thought models etc. This seems like a pretty reasonable approach for an organization with limited resources, if it is in fact MIRI’s approach. But I could believe that value could be added by looking at lots of budding AGI architectures and trying to figure out how one might make them safer on the margin.
Sure… but note that Eliezer Yudkowsky from MIRI was the one who invented the AI box experiment and ran the first few experiments, and FHI wrote this paper consisting of a bunch of ideas for what AI boxes consist of. (The other thing I didn’t mention as a weakness of empiricism is that empiricism doesn’t tell you what hypotheses might be useful to test. Knowing what hypotheses to test is especially nice to know when testing hypotheses is expensive.)
I could believe that there are fruitful lines of experimental inquiry that are neglected in the AI safety space. Overall it looks kinda like crypto to me in the sense that theoretical investigation seems more likely to pan out. But I’m supportive of people thinking hard about specific useful experiments that someone could run. (You could survey all the claims in Bostrom’s Superintelligence and try to estimate what fraction could be cheaply tested experimentally. Remember that just because a claim can’t be tested experimentally doesn’t mean it’s not an important claim worth thinking about...)
It seems I should have picked a different phrase to convey my intended target of ire. The problem isn’t concept formation by means of comparing similar reference classes, but rather using thought experiments as evidence and updating on them.
To be sure, thought experiments are useful for noticing when you are confused. They can also be semi-dark art in providing intuition pumps. Einstein did well in introducing special relativity by means of a series of thought experiments, by getting the reader to notice their confusion over classical electromagnetism in moving reference frames, then providing an intuition pump for how his own relativity worked in contrast. It makes his paper one of the most beautiful works in all of physics. However it was the experimental evidence which proved Einstein right, not the gedankenexperimenten.
If a thought experiment shows something to not feel right, that should raise your uncertainty about whether your model of what is going on is correct or not (notice your confusion), to whit the correct response should be “how can I test my beliefs here?” Do NOT update on thought experiments, as thought experiments are not evidence. The thought experiment triggers an actual experiment—even if that experiment is simply looking up data that is already collected—and the actual experimental results is what updates beliefs.
MIRI has not to my knowledge released any review of existing AGI architectures. If that is their belief, the onus is on them to support it.
He invented the AI box game. If it’s an experiment, I don’t know what it is testing. It is a setup totatly divorced from any sane reality for how AGI might actually develop and what sort of controls might be in place, with built-in rules that favor the AI.
Yet nevertheless, time and time again people such as yourself point me to the AI box games as if it demonstrated anything of note, anything which should cause me to update my beliefs.
It is, I think, the examples of the sequences and the character of many of the philosophical discussions which happen here that drive people to feel justified in making such ungrounded inferences. And it is that tendency which possibly makes the sequences and/or less wrong a memetic hazard.
I have such very strong agreement with you here.
…but I disagree with you here.
Thought experiments and reasoning by analogy and the like are ways to explore hypothesis space. Elevating hypotheses for consideration is updating. Someone with excellent Bayesian calibration would update much much less on thought experiments etc. than on empirical tests, but you run into really serious problems of reasoning if you pretend that the type of updating is fundamentally different in the two cases.
I want to emphasize that I think you’re highlighting a strength this community would do well to honor and internalize. I strongly agree with a core point I see you making.
But I think you might be condemning screwdrivers because you’ve noticed that hammers are really super-important.
Selecting a likely hypothesis for consideration does not alter that hypothesis’ likelihood. Do we agree on that?
Hmm. Maybe. It depends on what you mean by “likelihood”, and by “selecting”.
Trivially, noticing a hypothesis and that it’s likely enough to justify being tested absolutely is making it subjectively more likely than it was before. I consider that tautological.
If someone is looking at n hypotheses and then decided to pick the kth one to test (maybe at random, or maybe because they all need to be tested at some point so why not start with the kth one), then I quite agree, that doesn’t change the likelihood of hypothesis #k.
But in my mind, it’s vividly clear that the process of plucking a likely hypothesis out of hypothesis space depends critically on moving probability mass around in said space. Any process that doesn’t do that is literally picking a hypothesis at random. (Frankly, I’m not sure a human mind even can do that.)
The core problem here is that most default human ways of moving probability mass around in hypothesis space (e.g. clever arguments) violate the laws of probability, whereas empirical tests aren’t nearly as prone to that.
So, if you mean to suggest that figuring out which hypothesis is worthy of testing does not involve altering our subjective likelihood that said hypothesis will turn out to be true, then I quite strongly disagree.
But if you mean that clever arguments can’t change what’s true even by a little bit, then of course I agree with you.
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
There’s a difference? Probability is probability.
If you go about selecting a hypothesis by evaluating a space of hypotheses to see how they rate against your model of the world (whether you think they are true) and against each other (how much you stand to learn by testing them), you are essentially coming to reflective equilibrium regarding these hypothesis and your current beliefs. What I’m saying is that this shouldn’t change your actual beliefs—it will flush out some stale caching, or at best identify an inconsistent belief, including empirical data that you haven’t fully updated on. But it does not, by itself, constitute evidence.
So a clever argument might reveal an inconsistency in your priors, which in turn might make you want seek out new evidence. But the argument itself is insufficient for drawing conclusions. Even if the hypothesis is itself hard to test.
There very much is a difference.
Probability is a mathematical construct. Specifically, it’s a special kind of measure p on a measure space M such that p(M) = 1 and p obeys a set of axioms that we refer to as the axioms of probability (where an “event” from the Wikipedia page is to be taken as any measurable subset of M).
This is a bit like highlighting that Euclidean geometry is a mathematical construct based on following thus-and-such axioms for relating thus-and-such undefined terms. Of course, in normal ways of thinking we point at lines and dots and so on, pretend those are the things that the undefined terms refer to, and proceed to show pictures of what the axioms imply. Formally, mathematicians refer to this as building a model of an axiomatic system. (Another example of this is elliptic geometry, which is a type of non-Euclidean geometry, which you can model as doing geometry on a sphere.)
The Frequentist and Bayesian models of probability theory are relevantly different. They both think of M as the space of possible results (usually called the “sample space” but not always) and a measurable subset E ≤ M as an “event”. But they use different models of p:
Frequentists suggest that were you to look at how often all of the events in M occur, the one we’re looking at (i.e., E) would occur at a certain frequency, and that’s how we should interpret p(E). E.g., if M is the set of results from flipping a fair coin and E is “heads”, then it is a property of the setup that p(E) = 0.5. A different way of saying this is that Frequentists model p as describing a property of that which they are observing—i.e., that probability is a property of the world.
Bayesians, on the other hand, model p as describing their current state of confidence about the true state of the observed phenomenon. In other words, Bayesians model p as being a property of mental models, not of the world. So if M is again the results from flipping a fair coin and E is “heads”, then to a Bayesian the statement p(E) = 0.5 is equivalent to saying “I equally expect getting a heads to not getting a heads from this coin flip.” To a Bayesian, it doesn’t make sense to ask what the “true” probability is that their subjective probability is estimating; the very question violates the model of p by trying to sneak in a Frequentist presumption.
Now let’s suppose that M is a hypothesis space, including some sector for hypotheses that haven’t yet been considered. When we say that a given hypothesis H is “likely”, we’re working within a partial model, but we haven’t yet said what “likely” means. The formalism is easy: we require that H ≤ M is measurable, and the statement that “it’s likely” means that p(H) is larger than most other measurable subsets of M (and often we mean something stronger, like p(H) > 0.5). But we haven’t yet specified in our model what p(H) means. This is where the difference between Frequentism and Bayesianism matters. A Frequentist would say that the probability is a property of the hypothesis space, and noticing H doesn’t change that. (I’m honestly not sure how a Frequentist thinks about iterating over a hypothesis space to suggest that H in fact would occur at a frequency of p(H) in the limit—maybe by considering the frequency in counterfactual worlds?) A Bayesian, by contrast, will say that p(H) is their current confidence that H is the right hypothesis.
What I’m suggesting, in essence, is that figuring out which hypothesis H ≤ M is worth testing is equivalent to moving from p to p’ in the space of probability measures on M in a way that causes p’(H) > p(H). This is coming from using a Bayesian model of what p is.
Of course, if you’re using a Frequentist model of p, then “most likely hypothesis” actually refers to a property of the hypothesis space—though I’m not sure how you would find out the frequency at which hypotheses turn out to be true the way you figure out the frequency at which a coin comes up heads. But that could just be my not being as familiar thinking in terms of the Frequentist model.
I’ll briefly note that although I find the Bayesian model more coherent with my sense of how the world works on a day-by-day basis, I think the Frequentist model makes more sense when thinking about quantum physics. The type of randomness we find there isn’t just about confidence, but is in fact a property of the quantum phenomena in question. In this case a well-calibrated Bayesian has to give a lot of probability mass to the hypothesis that there is a “true probability” in some quantum phenomena, which makes sense if we switch the model of p to be Frequentist.
But in short:
Yes, there’s a difference.
And things like “probability” and “belief” and “evidence” mean different things depending on what model you use.
Yep, we disagree.
I think the disagreement is on two fronts. One is based on using different models of probability, which is basically not an interesting disagreement. (Arguing over which definition to use isn’t going to make either of us smarter.) But I think the other is substantive. I’ll focus on that.
In short, I think you underestimate the power of noticing implications of known facts. I think that if you look at a few common or well-known examples of incomplete deduction, it becomes pretty clear that figuring out how to finish thinking would be intensely powerful:
Many people make resolutions to exercise, be nicer, eat more vegetables, etc. And while making those resolutions, they often really think they mean it this time. And yet, there’s often a voice of doubt in the back of the mind, as though saying “Come on. You know this won’t work.” But people still quite often spend a bunch of time and money trying to follow through on their new resolution—often failing for reasons that they kind of already knew would happen (and yet often feeling guilty for not sticking to their plan!).
Religious or ideological deconversion often comes from letting in facts that are already known. E.g., I used to believe that the results of parapsychological research suggested some really important things about how to survive after physical death. I knew all the pieces of info that finally changed my mind months before my mind actually changed. I had even done experiments to test my hypotheses and it still took months. I’m under the impression that this is normal.
Most people reading this already know that if they put a ton of work into emptying their email inbox, they’ll feel good for a little while, and then it’ll fill up again, complete with the sense of guilt for not keeping up with it. And yet, somehow, it always feels like the right thing to do to go on an inbox-emptying flurry, and then get around to addressing the root cause “later” or maybe try things that will fail after a month or two. This is an agonizingly predictable cycle. (Of course, this isn’t how it goes for everyone, but it’s common enough that well over half the people who attend CFAR workshops seem to relate to it.)
Most of Einstein’s work in raising special relativity to consideration consisted of saying “Let’s take the Michelson-Morley result at face value and see where it goes.” Note that he is now considered the archetypal example of a brilliant person primarily for his ability to highlight worthy hypotheses via running with the implications of what is already known or supposed.
Ignaz Semmelweis found that hand-washing dramatically reduced mortality in important cases in hospitals. He was ignored, criticized, and committed to an insane asylum where guards beat him to death. At a cultural level, the fact that whether Semmelweis was right was (a) testable and (b) independent of opinion failed to propagate until after Louis Pasteur gave the medical community justification to believe that hand-washing could matter. This is a horrendous embarrassment, and thousands of people died unnecessarily because of a cultural inability to finish thinking. (Note that this also honors the need for empiricism—but the point here is that the ability to finish thinking was a prerequisite for empiricism mattering in this case.)
I could keep going. Hopefully you could too.
But my point is this:
Please note that there’s a baby in that bathwater you’re condemning as dirty.
Those are not different models. They are different interpretations of the utility of probability in different classes of applications.
You do it exactly the same as in your Bayesian example.
I’m sorry, but this Bayesian vs Frequentist conflict is for the most part non-existent. If you use probability to model the outcome of an inherently random event, people have called that “frequentist.” If instead you model the event as deterministic, but your knowledge over the outcome as uncertain, then people have applied the label “bayesian.” It’s the same probability, just used differently.
It’s like how if you apply your knowledge of mechanics to bridge and road building, it’s called civil engineering, but if you apply it to buildings it is architecture. It’s still mechanical engineering either way, just applied differently.
One of the failings of the sequences is the amount of emphasis that is placed on “Frequentist” vs “Bayesian” interpretations. The conflict between the two exists mostly in Yudkowsky’s mind. Actual statisticians use probability to model events and knowledge of events simultaneously.
Regarding the other points, every single example you gave involves using empirical data that had not sufficiently propagated, which is exactly the sort of use I am in favor of. So I don’t know what it is that you disagree with.
That’s what a model is in this case.
How sure are you of that?
I know a fellow who has a Ph.D. in statistics and works for the Department of Defense on cryptography. I think he largely agrees with your point: professional statisticians need to use both methods fluidly in order to do useful work. But he also doesn’t claim that they’re both secretly the same thing. He says that strong Bayesianism is useless in some cases that Frequentism gets right, and vice versa, though his sympathies lie more with the Frequentist position on pragmatic grounds (i.e. that methods that are easier to understand in a Frequentist framing tend to be more useful in a wider range of circumstances in his experience).
I think the debate is silly. It’s like debating which model of hyperbolic geometry is “right”. Different models highlight different intuitions about the formal system, and they make different aspects of the formal theorems more or less relevant to specific cases.
I think Eliezer’s claim is that as a matter of psychology, using a Bayesian model of probability lets you think about the results of probability theory as laws of thought, and from that you can derive some useful results about how one ought to think and what results from experimental psychology ought to capture one’s attention. He might also be claiming somewhere that Frequentism is in fact inconsistent and therefore is simply a wrong model to adopt, but honestly if he’s arguing that then I’m inclined to ignore him because people who know a lot more about Frequentism than he does don’t seem to agree.
But there is a debate, even if I think it’s silly and quite pointless.
And also, the axiomatic models are different, even if statisticians use both.
The concern about AI risk is also the result of an attempt to propagate implications of empirical data. It just goes farther than what I think you consider sensible, and I think you’re encouraging an unnecessary limitation on human reasoning power by calling such reasoning unjustified.
I agree, it should itch that there haven’t been empirical tests of several of the key ideas involved in AI risk, and I think there should be a visceral sense of making bullshit up attached to this speculation unless and until we can find ways to do those empirical tests.
But I think it’s the same kind of stupid to ignore these projections as it is to ignore that you already know how your New Year’s Resolution isn’t going to work. It’s not obviously as strong a stupidity, but the flavor is exactly the same.
If we could banish that taste from our minds, then even without better empiricism we would be vastly stronger.
I’m concerned that you’re underestimating the value of this strength, and viewing its pursuit as a memetic hazard.
I don’t think we have to choose between massively improving our ability to make correct clever arguments and massively improving the drive and cleverness with which we ask nature its opinion. I think we can have both, and I think that getting AI risk and things like it right requires both.
But just as measuring everything about yourself isn’t really a fully mature expression of empiricism, I’m concerned about the memes you’re spreading in the name of mature empiricism retarding the art of finishing thinking.
I don’t think that they have to oppose.
And I’m under the impression that you think otherwise.
This seems like it would be true only if you’d already propagated all logical consequences of all observations you’ve made. But an argument can help me to propagate. Which means it can make me update my beliefs.
For example, is 3339799 a prime number?
One ought to assign some prior probability to it being a prime. A naive estimate might say, well, there are two options, so let’s assign it 50% probability.
You could also make a more sophisticated argument about the distribution of prime numbers spreading out as you go towards infinity, and given that only 25 of the first 100 numbers are prime, the chance that a randomly selected number in the millions should be prime is less than 25% and probably much lower.
I claim that in a case like this it is totally valid to update your beliefs on the basis of an argument. No additional empirical test required before updating.
Do you agree?
I think the definition of ‘experiment’ gets tricky and confusing when you are talking about math specifically. When you talk about finding the distribution of prime numbers and using that to arrive at a more accurate model for your prior probability of 3339799 being prime, that is an experiment.
Math is unique in that regard though. For questions about the real world we must seek evidence that is outside of our heads.
Is that a conclusion or a hypothesis? I don’t believe there is a fundamental distinction between “actual beliefs”, “conclusions” and “hypotheses”. What should it take to change my beliefs about this?
I’ll think about how this can be phrased differently such that it might sway you. Given that you are not Valentine, is there a difference of opinion between his posts above and your views?
That part you pulled out and quoted is essentially what I was writing about in the OP. There is a philosophy-over-hard-subjects which is pursued here, in the sequences, at FHI, and is exemplified in the conclusions drawn by Bostrom in Superintelligence, and Yudkowsky in the later sequences. Sometimes it works, e.g. the argument in the sequences about the compatibility of determinism and free will works because it essentially shows how non-determinism and free will are incompatible—it exposes a cached thought that free-will == non-deterministic choice which was never grounded in the first place. But over new subjects where you are not confused in the first place—e.g. the nature and risk of superintelligence—people seem to be using thought experiments alone to reach ungrounded conclusions, and not following up with empirical studies.
That is dangerous. If you allow yourself to reason from thought experiments alone, I can get you to believe almost anything. I can’t get you to believe the sky is green—unless you’ve never seen the sky—but anything you yourself don’t have available experimental evidence for or against, I can sway you in either way. E.g. that consciousness is in information being computed and not the computational process itself. That an AI takeoff would be hard, not soft, and basically uncontrollable. That boxing techniques are foredoomed to failure irregardless of circumstances. That intelligence and values are orthogonal under all circumstances. That cryonics is an open-and-shut case. On these sorts of questions we need more, not less experimentation.
When you hear a clever thought experiment that seems to demonstrate the truth of something you previously thought to have low probability, then (1) check if your priors here are inconsistent with each other; then (2) check if there is empirical data here that you have not fully updated on. If neither of those approaches resolves the issue, then (3) notice you are confused, and seek an experimental result to resolve the confusion. If you are truly unable to find an experimental test you can perform now, then (4) operate as if you do not know which of the possible theories is true.
You do not say “that thought experiment seemed convincing, so until I know otherwise I’ll update in favor of it.” That is the sort of thinking which led the ancients to believe that “All things come to rest eventually, so the natural state is a lack of motion. Planets continue in clockwork motion, so they must be a separate magisteria from earthly objects.” You may think we as rationalists are above that mistake, but history has shown otherwise. Hindsight bias makes the Greeks seem a lot stupider than they actually were.
Take a concrete example: the physical origin of consciousness. We can rule out the naïve my-atoms-constitute-my-consciousness view from biological arguments. However I have been unable to find or construct for myself an experiment which would definitively rule out either the information-identity or computational-process theories, both of which are supported by available empirical evidence.
How is this relevant? Some are arguing for brain preservation instead of cryonics. But this only achieves personal longevity if the information-identity theory is correct as it is destructive of the computational process. Cryonics on the other hand achieves personal longevity by preserving the computational substrate itself, which achieves both information- and computational-preservation. So unless there is a much larger difference in success likelihood than appears to be the case, my money (and my life) is on cryonics. Not because I think that computational-process theory is correct (although I do have other weak evidence that makes it more likely), but because I can’t rule it out as a possibility so I must consider the case where destructive brain preservation gets popularized but at the cost of fewer cryopreservations, and it turns out that personal longevity is only achieved with the preservation of computational processes. So I do not support the Brain Preservation Foundation.
To be clear, I think that arguing for destructive brain preservation at this point in time is a morally unconscionable thing to do, even though (exactly because!) we don’t know the nature of consciousness and personal identity, and there is an alternative which is likely to work no matter how that problem is resolved.
My point is that the very statements you are making, that we are all making all the time, are also very theory-loaded, “not followed up with empirical studies”. This includes the statements about the need to follow things up with empirical studies. You can’t escape the need for experimentally unverified theoretical judgement, and it does seem to work, even though I can’t give you a well-designed experimental verification of that. Some well-designed studies even prove that ghosts exist.
The degree to which discussion of familiar topics is closer to observations than discussion of more theoretical topics is unclear, and the distinction should be cashed out as uncertainty on a case-by-case basis. Some very theoretical things are crystal clear math, more certain than the measurement of the charge of an electron.
Being wrong is dangerous. Not taking theoretical arguments into account can result in error. This statement probably wouldn’t be much affected by further experimental verification. What specifically should be concluded depends on the problem, not on a vague outside measure of the problem like the degree to which it’s removed from empirical study.
Before considering the truth of a statement, we should first establish its meaning, which describes the conditions for judging its truth. For a vague idea, there are many alternative formulations of its meaning, and it may be unclear which one is interesting, but that’s separate from the issue of thinking about any specific formulation clearly.
I”m not aware of ghosts, Scott talks about telepathy and precognition studies.
Ghosts specifically seem like too complicated a hypothesis to extract from any experimental results I’m aware of. If we didn’t already have a concept of ghosts, I doubt any parapsychology experiments that have taken place would have caused us to develop one.
People select hypotheses for testing because they have previously weakly updated in the direction of them being true. Seeing empirical data produces a later, stronger update.
Except that when the hypothesis space is large, people test hypotheses because they strongly updated in the direction of them being true, and seeing empirical data produces a later, weaker update. Where an example of ‘strongly updating’ could be going from 9,999,999:1 odds against a hypothesis to 99:1 odds, and an example of ‘weakly updating’ could be going from 99:1 odds against the hypothesis to 1:99. The former update requires about 20 bits of evidence, while the latter update requires about 10 bits of evidence.
Interesting point. I guess my intuitive notion of a “strong update” has to do with absolute probability mass allocation rather than bits of evidence (probability mass is what affects behavior?), but that’s probably not a disagreement worth hashing out.
I like your way of saying it. It’s much more efficient than mine!
Thanks! Paul Graham is my hero when it comes to writing and I try to pack ideas as tightly as possible. (I recently reread this essay of his and got amazed by how many ideas it contains; I think it has more intellectual content than most published nonfiction books, in just 10 pages or so. I guess the downside of this style is that readers may not go slow enough to fully absorb all the ideas. Anyway, I’m convinced that Paul Graham is the Ben Franklin of our era.)
Thanks for the response.
I have a feeling your true rejection runs deeper than you’re describing. You cite a thought experiment of Einstein’s as being useful and correct. You explain that Less Wrong relies on thought experiments too heavily. You suggest that Less Wrong should lean heavier on data from the real world. But the single data point you cite on the question of thought experiments indicates that they are useful and correct. It seems like your argument fails by its own standard.
I think the reliability of thought experiments is a tricky question to resolve. I think we might as well expand the category of thought experiments to “any reasoning about the world that isn’t reasoning directly from data”. When I think about the reliability of this reasoning, my immediate thought is that I expect some people to be much better at it than others. In fact, I think being good at this sort of reasoning is almost exactly the same as being intelligent. Reasoning directly from data is like looking up the answers in the back of the book.
This leaves us with two broad positions: the “humans are dumb/the world is tricky” position that the only way we can ever get anywhere is through constant experimentation, and the “humans are smart/the world is understandable” position that we can usefully make predictions based on limited data.
I think these positions are too broad to be useful. It depends a lot on the humans, and it depends a lot on the aspect of the world being studied. Reasoning from first principles works better in physics than in medicine; in that sense, medicine is a trickier subject to study.
If the tricky world hypothesis is true for the questions MIRI is investigating, or the MIRI team is too dumb, I could see the sort of empirical investigations you propose as being the right approach: they don’t really answer the most important questions we want answered, but there probably isn’t a way for MIRI to answer those questions anyway, so might as well answer the questions that are answerable and see if the results lead anywhere.
Anyway, I think a lot of the value of many LW posts is in finding useful ideas that are also very general. (Paul Graham’s description of what philosophy done right looks like.) Very general ideas are harder to test, because they cut across domains. The reason I like the many citations in Thinking Fast and Slow is that I expect the general ideas it presents to be more reliably true because they’re informed by at least some experimental data. But general, useful ideas can be so useful that I don’t mind taking the time to read about them even if they’re not informed by lots of experimental data. Specifically, having lots of general, useful ideas that are also correct (e.g. knowing when and how to add numbers) makes you more intelligent according to my definition above. And I consider myself intelligent enough to be able to tell apart the true general, useful ideas from the false ones through my own reasoning and experience at least somewhat reliably.
Broadly speaking, I think Less Wrong is counterproductive if specific general, useful ideas it promotes are false. (It’s hard to imagine how it could be counterproductive if they were true.) And at that point we’re talking about whether specific posts are true or false. Lukeprog has this list of points of agreement between the sequences and mainstream academia, which causes me to update in the direction of those points of agreement being true.
I think you’re being overly hard on the AI box experiment. It’s obviously testing something. It’d be great if we could fork the universe, import a superintelligence, set up a bunch of realistic safeguards, and empirically determine how things played out. But that’s not practical. We did manage to find an experiment that might shed some light on the scenario, but the experiment uses a smart human instead of a superintelligence and a single gatekeeper instead of a more elaborate set of controls. It seems to me that you aren’t following your own standard: you preach the value of empiricism and then throw out some of the only data points available, for theoretical reasons, without producing any better data. I agree that it’s some pretty weak data, but it seems better to think about it than throw it out and just believe whatever we like, and I think weak data is about as well as you’re going to do in this domain.
I cite a thought experiment of Einstein’s as being useful but insufficient. It was not correct until observation matched anticipation. I called out Einstein’s thought experiment as being a useful pedagogical technique, but not an example of how to arrive at truth. Do you see the difference?
No, this is not obvious to me. Other than the ability of two humans to outwit each other within the confines of strict enforcement of arbitrarily selected rules, what is it testing, exactly? And what does that thing being tested have to do with realistic AIs and boxes anyway?
What’s your model of how Einstein in fact arrived at truth, if not via a method that is “an example of how to arrive at truth”? It’s obvious the method has to work to some extent, because Einstein couldn’t have arrived at a correct view by chance. Is your view that Einstein should have updated less from whatever reasoning process he used to pick out that hypothesis from the space of hypotheses, than from the earliest empirical tests of that hypothesis, contra Einstein’s Arrogance?
Or is your view that, while Einstein may technically have gone through a process like that, no one should assume they are in fact Einstein—i.e., Einstein’s capabilities are so rare, or his methods are so unreliable (not literally at the level of chance, but, say, at the level of 1000-to-1 odds of working), that by default you should harshly discount any felt sense that your untested hypothesis is already extremely well-supported?
Or perhaps you should harshly discount it until you have meta-evidence, in the form of a track record of successfully predicting which untested hypotheses will turn out to be correct.
The AI box experiment is a response to the claim ‘superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box’. It functions as an existence proof; if a human level of social competence is already sufficient to talk one’s way out of a box with nonzero frequency, then we can’t dismiss risk from superhuman levels of social competence.
If you think the claim Eliezer was responding to is silly on priors, or just not relevant (because it would be easy to assess an AI’s social competence and/or prevent it from gaining such competence), then you won’t be interested in that part of the conversation.
You can’t work backwards from the fact that someone arrived at truth in one case to the the premise that they must have been working from a reliable method for arriving at truth. It’s the “one case” that’s the problem. They might have struck lucky.
Einstein’s thought experiments inspired his formal theories, which were then confirmed by observation. Nobody thought the thought experiments provided confirmation by themselves.
I mentioned that possibility above. But Einstein couldn’t have been merely lucky—even if it weren’t the case that he was able to succeed repeatedly, his very first success was too improbable for him to have just plucking random physical theories out of a hat. Einstein was not a random number generator, so there was some kind of useful cognitive work going on.
That leaves open the possibility that it was only useful enough to give Einstein a 1% chance of actually being right; but still, I’m curious about whether you do think he only had a 1% chance of being right, or (if not) what rough order of magnitude you’d estimate. And I’d likewise like to know what method he used to even reach a 1% probability of success (or 10%, or 0.1%), and why we should or shouldn’t think this method could be useful elsewhere.
Can you define “confirmation” for me, in terms of probability theory?
Big Al may well have had some intuitive mojo that enabled him to pick the right thought experiments , but that still doesn’t make thought experiments a substitute for real empiricism. And intuitive mojo, isnt a method in the sense of vbeing reproducible.
Why not derive probability theory in terms of confirmation.?
Thought experiments aren’t a replacement for real empiricism. They’re a prerequisite for real empiricism.
“Intuitive mojo” is just calling a methodology you don’t understand a mean name. However Einstein repeatedly hit success in his lifetime, presupposing that it is an ineffable mystery or a grand coincidence won’t tell us much.
I already understand probability theory, and why it’s important. I don’t understand what you mean by “confirmation,” how your earlier statement can be made sense of in quantitative terms, or why this notion should be treated as important here. So I’m asking you to explain the less clear term in terms of the more clear term.
Actually he did not. He got lucky early in his career, and pretty much coasted on that into irrelevance. His intuition allowed him to solve problems related to relativity, the photoelectric effect, Brownian motion, and a few other significant contributions within the span of a decade, early in his career. And then he went off the deep end following his intuition down a number of dead-ending rabbit holes for the rest of his life. He died in Princeton in 1955 having made no further significant contributions to physics after is 1916 invention of general relativity. Within the physics community (I am a trained physicist), Einstein’s story is retold more often as a cautionary tale than a model to emulate.
There are worse fates than not being able to top your own discovery of general relativity.
...huh? Correct me if I’m wrong here, but Einstein was a great physicist who made lots of great discoveries, right?
The right cautionary tale would be to cite physicists who attempted to follow the same strategy Einstein did and see how it mostly only worked for Einstein. But if Einstein was indeed a great physicist, it seems like at worst his strategy is one that doesn’t usually produce results but sometimes produces spectacular results… which doesn’t seem like a terrible strategy.
I have a very strong (empirical!) heuristic that the first thing people should do if they’re trying to be good at something is copy winners. Yes there are issues like regression to the mean and stuff, but it provides a good alternative perspective vs thinking things through from first principles (which seems to be my default cognitive strategy).
The thing is Einstein was popular, but his batting average was less than his peers. In terms of advancing the state of the art, the 20th century is full of theoretical physicists that have a better track record for pushing the state of the art forward than Einstein, most of whom did not spend the majority of their career chasing rabbits down holes. They may not be common household names, but honestly that might have more to do with the hair than his physics.
I should point out that I heard this cautionary tale as “don’t set your sights too high,” not “don’t employ the methods Einstein employed.” The methods were fine, the trouble was that he was at IAS and looking for something bigger than his previous work, rather than planting acorns that would grow into mighty oaks (as Hamming puts it).
OK, good to know.
The AI box experiment only serves even as that if you assume that the AI box experiment sufficiently replicates the conditions that would actually be faced by someone with an AI in a box. Also, it only serves as such if it is otherwise a good experiment, but since we are not permitted to see the session transcripts for ourselves, we can’t tell if it is a good experiment.
Again, the AI box experiment is a response to the claim “superintelligences are easy to box, because no level of competence at social engineering would suffice for letting an agent talk its way out of a box”. If you have some other reason to think that superintelligences are hard to box—one that depends on a relevant difference between the experiment and a realistic AI scenario—then feel free to bring that idea up. But this constitutes a change of topic, not an objection to the experiment.
I mean, the experiment’s been replicated multiple times. And you already know the reasons the transcripts were left private. I understand assigning a bit less weight to the evidence because you can’t examine it in detail, but the hypothesis that there’s a conspiracy to fake all of these experiments isn’t likely.
Not all relevant differences between an experiment and an actual AI scenario can be accurately characterized as “reason to think that superintelligences are hard to box”. For instance, imagine an experiment with no gatekeeper or AI party at all, where the result of the experiment depends on flipping a coin to decide whether the AI gets out. That experiment is very different from a realistic AI scenario, but one need not have a reason to believe that intelligences are hard to box—or even hold any opinion at all on whether intelligences are hard to box—to object to the experimental design.
For the AI box experiment as stated, one of the biggest flaws is that the gatekeeper is required to stay engaged with the AI and can’t ignore it. This allows the AI to win by either verbally abusing the gatekeeper to the extent that he doesn’t want to stay around any more, or by overwhelming the gatekeeper with lengthy arguments that take time or outside assistance to analyze. These situations would not be a win for an actual AI in a box.
Refusing to release the transcripts causes other problems than just hiding fakery. If the experiment is flawed in some way, for instance, it could hide that—and it would be foolish to demand that everyone name possible flaws one by one and ask you “does this have flaw A?”, “does this have flaw B?”, etc. in order to determine whether the experiment has any flaws. There are also cases where whether something is a flaw is an opinion that can be argued, and it might be that someone else would consider a flaw something that the experimenter doesn’t.
Besides, in a real boxed AI situation, it’s likely that gatekeepers will be tested on AI-box experiments and will be given transcripts of experiment sessions to better prepare them for the real AI. An experiment that simulates an AI boxing should likewise have participants be able to read other sessions.
BTW, I realized there’s something else I agree with you on that’s probably worth mentioning:
Eliezer in particular, I think, is indeed overconfident in his ability to reason things out from first principles. For example, I think he was overconfident in AI foom (see especially the bit at the end of that essay). And even if he’s calibrated his ability correctly, it’s totally possible that others who don’t have the intelligence/rationality he does could pick up the “confident reasoning from first principles” meme and it would be detrimental to them.
That said, he’s definitely a smart guy and I’d want to do more thinking and research before making a confident judgement. What I said is just my current estimate.
Insofar as I object to your post, I’m objecting to the idea that empiricism is the be-all and end-all of rationality tools. I’m inclined to think that philosophy (as described in Paul Graham’s essay) is useful and worth learning about and developing.
For a start.… there’s also a lack of discernible point in a lot of places. But too much good stuff to justify rejecting the whole thing.
Yes, I do. Intuitively, this seems correct. But I’d still like to see you expound on the idea.
BTW, this discussion has some interesting parallels to mine & Mark’s.
This example actually proves the opposite. Bitcoin was described in a white paper that wasn’t very impressive by academic crypto standards—few if anyone became interested in Bitcoin from first reading the paper in the early days. It’s success was proven by experimentation, not pure theoretical investigation.
It’s hard to investigate safety if one doesn’t know the general shape that AGI will finally take. MIRI has focused on a narrow subset of AGI space—namely transparent math/logic based AGI. Unfortunately it is becoming increasingly clear that the Connectionists were more or less absolutely right in just about every respect . AGI will likely take the form of massive brain-like general purpose ANNs. Most of MIRI’s research thus doesn’t even apply to the most likely AGI candidate architecture.
In this essay I wrote:
I’m guessing this is likely to be true of general-purpose ANNs, meaning recursive self-improvement would be more difficult for a brain-like ANN than it might be for some other sort of AI? (This would be somewhat reassuring if it was true.)
It’s not clear that there is any other route to AGI—all routes lead to “brain-like ANNs”, regardless of what linguistic label we use (graphical models, etc).
General purpose RL—in ideal/optimal theoretical form—already implements recursive self-improvement in the ideal way. If you have an ideal/optimal general RL system running, then there are no remaining insights you could possibly have which could further improve its own learning ability.
The evidence is accumulating that general Bayesian RL can be efficiently approximated, that real brains implement something like this, and that very powerful general purpose AI/AGI can be built on the same principles.
Now, I do realize that by “recursive self-improvement” you probably mean a human level AGI consciously improving its own ‘software design’, using slow rule based/logic thinking of the type suitable for linguistic communication. But there is no reason to suspect that the optimal computational form of self-improvement should actually be subject to those constraints.
The other, perhaps more charitable view of “recursive self-improvement” is the more general idea of the point in time where AGI engineers/researchers takeover most of the future AGI engineering/research work. Coming up with new learning algorithms will probably be only a small part of the improvement work at that point. Implementations however can always be improved, and there is essentially an infinite space of better hardware designs. Coming up with new model architectures and training environments will also have scope for improvement.
Also, it doesn’t really appear to matter much how many modules the AGI has, because improvement doesn’t rely much on human insights into how each module works. Even with zero new ‘theoerical’ insights, you can just run the AGI on better hardware and it will be able to think faster or split into more copies. Either way, it will be able to speed up the rate at which it soaks up knowledge and automatically rewires itself (self-improves).
By experimentation, do you mean people running randomized controlled trials on Bitcoin or otherwise empirically testing hypotheses on the software? Just because your approach is collaborative and incremental doesn’t mean that it’s empirical.
Not really—by experimentation I meant proving a concept by implementing it and then observing whether the implementation works or not, as contrasted to the pure math/theory approach where you attempt to prove something abstractly on paper.
For context, I was responding to your statement:
Bitcoin is an example of typical technological development, which is driven largely by experimentation/engineering rather than math/theory. Theory is important mainly as a means to generate ideas for experimentation.