VojtaKovarik comments on OpenAI could help X-risk by wagering itself

VojtaKovarik 22 Apr 2023 3:00 UTC
1 point
0
(0) What I hint at in (1-3) seems like a very standard thing, for which there should be a standard reference post. Does anybody know it? If not, somebody should write on it.
(1) Pascal’s wagers in general: Am I interperting you correctly as saying that AI risk, doom by asteroid, belief in Christianity, doom by supernatural occurence, etc., all belong to the same class of “Pascal’s wager-like” decision problems?
If yes, then hurray, I definitely agree. However, I believe this is where you should bite the bullet and treat every Pascal’s wager seriously, and only dismiss it once you understand it somewhat. In particular, I claim that the heuristic “somebody claims X has extremely high or low payoffs ==> dismiss X until we have a rock-solid proof” is not valid. (Curious whether you disagree.) ((And to preempt the likely objection: To me, this position is perfectly compatible with initially dismissing crackpots and God’s wrath without engaging with even listening to the arguments; see below.))
Why do I believe this?
First, I don’t think we encounter so many different Pascal wagers that considering each one at least briefly will paralyze us. This is made easier because many of them are somewhat similar to each other (eg “religions”, “conspiracy theories”, “crackpot designs for perpetum mobile”, …), which allows you to only thing through a single representative of each class^[1], and then use the heuristics like “if it pattern-matches on a conspiracy theory, tentatively ignore it outright”. (Keyword “tentatively”. It seems fair to initially dismiss covid lab-leak hypothesis as a conspiracy. But if you later hear that there was a bio-lab in Wuhan, you should reevaluate. Similarly, I dismiss all religions outright, but I would spend some attention on it if many people I trust suddenly started saying that they saw miracles.)
Second, many Pascal wagers are non-actionable. If you learn about meteors or super-vulcanos in 18th century, there is not much you can do. Similarly, I assign a non-zero probability to the simulation probability. And perhaps even a non-zero probability of somebody running a simulation of a world that actually works on christianity, just for the lolz of it.^[2] But it is equally possible that this guy runs it on anti-christianity (saints going to hell), just for the lolz of it. And I don’t know which is more probable, and I have no way of resolving this, and the payoffs are not literally infinite (that guy would have a finite compute budget). So there is no point in me acting on this either.
Third, let me comment a bit on the meteor example [I don’t know the historical details, so I will be fabulating somewhat]: Sometime in the previous century, we learned about an asteroid hitting earth in the past and killing the dinosaurs. This seems like “there might be a giant explosion from an asteroid”. We did not force everybody to live underground, not because we didn’t take the risk seriously, but because doing so would have been stupid (we probably didn’t have an asteroid-surviving technology anyway, and the best way to get it would have been to stay on the surface). However, we did spend lots of money on the asteroid detection program to understand the risk better. As a result, we now understand this particular “Pascal’s wager” better, and it seems that in this particular case, the risk is small. That is, the risk is small enough that it is more important to focus on other things. However, I believe that we should still spend some effort on preventing asteroid risk (actually, I guess we do do that). And if we had no more-pressing issues (X-risk and otherwise), I totally would invest into some resources asteroid-explosion-survival measures.
To summarize the meteor example: We did not dismiss the risk out of hand. We investigated it seriously, and then we only dismissed it once we understood it. And it never made sense to apply extreme measures, because they wouldn’t have been effective even if the risk was real. (Though I am not sure whether that was on purpose or because we wouldn’t been able to apply extreme measures even if they were appropriate.) ((Uhm, please please please, I hope nobody starts making claims about low-level implications of the asteroid risk-analysis to AI risk-analysis. The low-level details of the two cases have nothing in common, so there are no implications to draw.))

(2) For “stop AI progress” to be rational, it is enough to be uncertain.
- Sure, if you claim something like “90% that AI is going to kill us” or “we are totally sure that AI is risky”, then the burden of proof is on you.
- But most people who advocate treating AI as risky, and that we should stop/slow/take extreme measures, do not claim anything like “90% that AI is going to kill us”. Most often, you see claims like (i) “at least 5% that AI could kill us”, or (ii) “we really don’t know whether AI might kill us, one way or the other”.
- And if you claim non-(i) or non-(ii), the burden of proof totally is on you. And I think that literally nobody in the world has a good enough model of AI & the strategic landscape to justify non-(i) or non-(ii). (Definitely nobody has published a gears-level model like that.)
- Also, I think that to make Pascal’s wager on AI rational, it is enough to believe (either (i) or (ii)) together with “we have no quick way of resolving the uncertainty”.
(3) AI risk as Pascal’s wager:
I claim that it is rational to treat AI as a Pascal’s wager, but that in this particular case, the details come out very differently (primarily because we understand AI so little).
To give an illustration (there will be many nonsense bits in the details; the goal is to give the general vibe):
1. If you know literally nothing about AI at all, it is fair to initially pattern-match it to “yeah yeah, random people talking about the end of the world, that’s bullshit”.
2. However, once you learn that AI stands for “artificial intelligence” you really should do a double-take: “Hey, intelligence is this thing that maybe makes us, and not other species, dominant on this planet. Also, humans are intelligent and they do horrible things to each other. So it is at least plausible that this could destroy us.” (That’s without knowing anything at all about how AI should work.)
3. At that point, you should consider whether this AI thing might be achievable. And if it is, say, year 1900, then you should go “I have literally no clue, so I should act as if the chance is non-negligible”. And then (much) later people invent computers and you hear about the substrate-independence thesis. You don’t understand any of this, but it sounds plausible. And then you should treat AI as a real possibility, until you can get a gear’s-level understanding of the details. (Which we still don’t have!)
4. Also, you try understand whether AI will be dangerous or not, and you will mostly come to the conclusion that you have no clue whatsoever. (And we still don’t even now. And on the current timeline, we probably won’t know until we get superintelligence.)
5. And from that point onward, you should be like “holy ****, this AI thing could be really dangerous! We should never build any thing like that unless we understand exactly what we are doing!”.
6. This does not mean you will not build a whole bunch of AI. You will build calculators, and you will build GOFAI, and tabular model-based RL.^[3] And then somebody comes up with the idea of deep learning, and you are like “holy ****, we have no clue how this works!”. And you put a moratorium on it, to quickly figure out what to do. And later you probably you end up controlling it pretty tightly, because you don’t know how it scales. And you only do the research on DL slowly, such that your interpretability techniques and other theory never gets behind.
Obviously, we don’t live in a world capable of something like above. But it remains a fact that we now have strong evidence that (1) AGI probably is possible and (2) we still have no clue whether it is going to be harmful of not, despite trying. And if we could do it, the rational thing to do would be to immediatelly stop most AI work until we figure out (2).
1. ^
  When I was younger, I did think a lot about christianity and I have a detailed model that causes me to assign literally 0 probability to it being correct. Similarly, I did engage with some conspiracy theories before dismissing them. I believe that justifies me to tentatively immediatelly dismiss anything that pattern matches to religion and conspiracy theories. However, if I never heard about any religion, then upon encountering one for the first time, I would totally not be justified to dismiss it outright. And let me bite the bullet here: I would say that if you were a random guy in ancient Egypt, without particularly good models of the world or ways to improve them, the rational thing to do could have been to act as if the Egyptian gods were real. [low confidence though, I haven’t thought about this before]
2. ^
  Not to be confused with the hypothesis that the universe actually works according to any of of the religions considered on earth without assuming simulation hypothesis. To that, I assign literally 0 probability.
3. ^
  But for all of those things, you briefly consider whether those things could be dangerous and conclude they aren’t. (Separately from AI risks: In an ideal world, you would also make some experiments to see how these things are affecting the users. EG, to see whether kids’ brains detoriate by relying on calculators. In practice, our science methods sucked back then, so you don’t learn anything. But hey, talking about how a better world would work.)