I think it’s good epistemic hygiene to notice when the mechanism underlying a high-level claim switches because the initially-proposed mechanism for the high-level claim turns out to be infeasible, and downgrade the credence you accord the high level claim at least somewhat. Particularly when the former mechanism has been proposed many times.
Alice: This ship is going to sink. I’ve looked at the boilers, they’re going to explode!
Alice: [Repeats claim ten times]
Bob: Yo, I’m an expert in thermodynamics and steel, the boilers are fine for X, Y, Z reason.
Alice: Oh. Well, the ship is still going to sink, it’s going to hit a sandbar.
Alice could still be right! But you should try to notice the shift and adjust credence downwards by some amount. Particularly if Alice is the founder of a group talking about why the ship is going to sink.
The original theory is sabotage, not specifically boiler explosion. People keep saying “How could you possibly sabotage a ship?”, and a boiler explosion is one possible answer, but it’s not the reason the ship was predicted to sink. Boiler explosion theory and sabotage theory both predict sinking, but it’s a false superficial agreement, these theories are moved by different arguments.
If someone had said “Yo, this one lonely saboteur is going to sink the ship” and consistently responded to requests for how by saying “By exploding the boiler”—then finding out that it was infeasible for a lone saboteur to sink the ship by exploding the boiler would again be some level of evidence against danger of the lone saboteur, so I don’t see how that changes it?
To make the analogy more concrete, suppose that Alice posts a 43-point thesis on MacGyver Ruin: A List Of Lethalities, similar to AGI Ruin, that explains that MacGyver is planning to sink our ship and this is likely to lead to the ship sinking. In point 2 of 43, Alice claims that:
MacGyver will not find it difficult to bootstrap to overpowering capabilities independent of our infrastructure. The concrete example I usually use here is exploding the boilers, because there’s been pretty detailed analysis of how what definitely look like physically attainable lower bounds on what should be possible with exploding the boilers, and those lower bounds are sufficient to carry the point. My lower-bound model of “how MacGyver would sink the ship, if he didn’t want to not do that” is that he gets access to the boilers, reverses the polarity of the induction coils, overloads the thermostat, and then the boilers blow up.
(Back when I was first deploying this visualization, the wise-sounding critics said “Ah, but how do you know even MacGyver could gain access to the boilers, if he didn’t already have a gun?” but one hears less of this after the advent of MacGyver: Lost Treasure of Atlantis, for some odd reason.)
Losing a conflict with MacGyver looks at least as deadly as “there’s a big explosion out of nowhere and then the ship sinks”.
Then, Bob comes along and posts a 24min reply, concluding with:
I think if there was a saboteur on board, that would increase the chance of the boiler exploding. For example, if they used the time to distract the guard with a clanging sound, they might be able to reach the boiler before being apprehended. So I think this could definitely increase the risk. However, there are still going to be a lot of human-scale bottlenecks to keep a damper on things, such as the other guard. And as always with practical sabotage, a large part of the process will be figuring out what the hell went wrong with your last explosion.
What about MacGyver? Well, now we’re guessing about two different speculative things at once, so take my words (and everyone else’s) with a double grain of salt. Obviously, MacGyver would increase sabotage effectiveness, but I’m not sure the results would be as spectacular as Alice expects.
I suppose this updates my probability of the boilers exploding downwards, just as I would update a little upwards if Bob had been similarly cagey in the opposite direction.
It doesn’t measurably update my probability of the ship sinking, because the boiler exploding isn’t a load-bearing part of the argument, just a concrete example. This is a common phenomenon in probability when there are agents in play.
It doesn’t measurably update my probability of the ship sinking
When you say, doesn’t “measurably,” do you mean that it doesn’t update all or doesn’t update much? I’m not saying you should update much. I’m just saying you should update some. Like I’m nodding along at your example, but my conclusion is instead simply the opposite.
Like suppose we’ve been worried about the imminent unaligned MacGyver threat. Some people say there’s no way he can sink the ship; other people say he can. So the people who say he can confer and try to offer 10 different plausible ways he could sink the ship.
If we found out all ten didn’t work, then—considering that these examples were selected for being the clearest ways he can destroy this ship—it’s hard for me to think this shouldn’t move you down at all. And so presumably finding out that just one didn’t work should move you down by some lesser amount, if finding out 10 didn’t work would also do so.
Imagine a a counterfactual world where people had asked, “how can he sink the ship” and people had responded “You don’t need to know how, that’s would just a concrete example, concrete examples are irrelevant to the principle which is simply that MacGuyver’s superior improvisational skills are sufficient to sink the ship.” I would have lower credence in MacGyver’s sink shipping ability in the world without concrete examples; I think most people would; I think it would be weird not to. So I think moving in the direction of such a world should similarly lower your credence.
I think the chess analogy is better: if I predict that, from some specific position, MacGyver will play some sequence of ten moves that will leave him winning, and then try to demonstrate that by playing from that position and losing, would you update at all?
I meant “measurably” in a literal sense: nobody can measure the change in my probability estimate, including myself. If my reported probability of MacGyver Ruin after reading Alice’s post was 56.4%, after reading Bob’s post it remains 56.4%. The size of a measurable update will vary based on the hypothetical, but it sounds like we don’t have a detailed model that we trust, so a measurable update would need to be at least 0.1%, possibly larger.
You’re saying I should update “some” and “somewhat”. How much do you mean by that?
This is the more interesting and important claim to check to me. I think the barriers to engineering bacteria are much lower, but it’s not obvious that this will avoid detection and humans responding to the threat, or that timing and/or triggers in bacteria can be reliable enough.
We do at least have one example of something like this happening already for natural causes, the Great Oxigenation Event. How long did that take? Had we been anaerobic organisms at the time, could we have stopped it?
Possibly, but by limiting access to the arguments, you also limit the public case for it and engagement by skeptics. The views within the area will also probably further reflect self-selection for credulousness and deference over skepticism.
There must be less infohazardous arguments we can engage with. Or, maybe zero-knowledge proofs are somehow applicable. Or, we can select a mutually trusted skeptic (or set of skeptics) with relevant expertise to engage privately. Or, legally binding contracts to prevent sharing.
No, but . . . you don’t need “diamondoid” technology to make nano-replicators that kill everything. Highly engineered bacteria could do the trick.
I think it’s good epistemic hygiene to notice when the mechanism underlying a high-level claim switches because the initially-proposed mechanism for the high-level claim turns out to be infeasible, and downgrade the credence you accord the high level claim at least somewhat. Particularly when the former mechanism has been proposed many times.
Alice could still be right! But you should try to notice the shift and adjust credence downwards by some amount. Particularly if Alice is the founder of a group talking about why the ship is going to sink.
The original theory is sabotage, not specifically boiler explosion. People keep saying “How could you possibly sabotage a ship?”, and a boiler explosion is one possible answer, but it’s not the reason the ship was predicted to sink. Boiler explosion theory and sabotage theory both predict sinking, but it’s a false superficial agreement, these theories are moved by different arguments.
If someone had said “Yo, this one lonely saboteur is going to sink the ship” and consistently responded to requests for how by saying “By exploding the boiler”—then finding out that it was infeasible for a lone saboteur to sink the ship by exploding the boiler would again be some level of evidence against danger of the lone saboteur, so I don’t see how that changes it?
Or maybe I’m misunderstanding you.
To make the analogy more concrete, suppose that Alice posts a 43-point thesis on MacGyver Ruin: A List Of Lethalities, similar to AGI Ruin, that explains that MacGyver is planning to sink our ship and this is likely to lead to the ship sinking. In point 2 of 43, Alice claims that:
Then, Bob comes along and posts a 24min reply, concluding with:
I suppose this updates my probability of the boilers exploding downwards, just as I would update a little upwards if Bob had been similarly cagey in the opposite direction.
It doesn’t measurably update my probability of the ship sinking, because the boiler exploding isn’t a load-bearing part of the argument, just a concrete example. This is a common phenomenon in probability when there are agents in play.
When you say, doesn’t “measurably,” do you mean that it doesn’t update all or doesn’t update much? I’m not saying you should update much. I’m just saying you should update some. Like I’m nodding along at your example, but my conclusion is instead simply the opposite.
Like suppose we’ve been worried about the imminent unaligned MacGyver threat. Some people say there’s no way he can sink the ship; other people say he can. So the people who say he can confer and try to offer 10 different plausible ways he could sink the ship.
If we found out all ten didn’t work, then—considering that these examples were selected for being the clearest ways he can destroy this ship—it’s hard for me to think this shouldn’t move you down at all. And so presumably finding out that just one didn’t work should move you down by some lesser amount, if finding out 10 didn’t work would also do so.
Imagine a a counterfactual world where people had asked, “how can he sink the ship” and people had responded “You don’t need to know how, that’s would just a concrete example, concrete examples are irrelevant to the principle which is simply that MacGuyver’s superior improvisational skills are sufficient to sink the ship.” I would have lower credence in MacGyver’s sink shipping ability in the world without concrete examples; I think most people would; I think it would be weird not to. So I think moving in the direction of such a world should similarly lower your credence.
I think the chess analogy is better: if I predict that, from some specific position, MacGyver will play some sequence of ten moves that will leave him winning, and then try to demonstrate that by playing from that position and losing, would you update at all?
I meant “measurably” in a literal sense: nobody can measure the change in my probability estimate, including myself. If my reported probability of MacGyver Ruin after reading Alice’s post was 56.4%, after reading Bob’s post it remains 56.4%. The size of a measurable update will vary based on the hypothetical, but it sounds like we don’t have a detailed model that we trust, so a measurable update would need to be at least 0.1%, possibly larger.
You’re saying I should update “some” and “somewhat”. How much do you mean by that?
This is the more interesting and important claim to check to me. I think the barriers to engineering bacteria are much lower, but it’s not obvious that this will avoid detection and humans responding to the threat, or that timing and/or triggers in bacteria can be reliable enough.
Unfortunately, explaining exactly what kind of engineered bacteria could be dangerous is a rather serious infohazard.
Don’t worry, I know of a way to stop any engineered bacteria before they can do any harm.
No, I’m not going to tell you what it is. Infohazard.
We do at least have one example of something like this happening already for natural causes, the Great Oxigenation Event. How long did that take? Had we been anaerobic organisms at the time, could we have stopped it?
Possibly, but by limiting access to the arguments, you also limit the public case for it and engagement by skeptics. The views within the area will also probably further reflect self-selection for credulousness and deference over skepticism.
There must be less infohazardous arguments we can engage with. Or, maybe zero-knowledge proofs are somehow applicable. Or, we can select a mutually trusted skeptic (or set of skeptics) with relevant expertise to engage privately. Or, legally binding contracts to prevent sharing.