RHollerith comments on Alignment by default: the simulation hypothesis

RHollerith 25 Sep 2024 19:14 UTC
3 points
0
Essentially the same question was asked in May 2022 although you did a better job in wording your question. Back then the question received 3 answers / replies and some back-and-forth discussion:

https://www.lesswrong.com/posts/vaX6inJgoARYohPJn/

I’m the author of one of the 3 answers and am happy to continue the discussion. I suggest we continue it here rather than in the 2-year-old web page.

Clarification: I acknowledge that it would be sufficiently easy for an ASI to spare our lives that it would do so if it thought that killing us all carried even a one in 100,000 chance of something really bad happening to it (assuming as is likely that the state of reality many 1000s of years from now matters to the ASI). I just estimate the probability of the ASI’s thinking the latter to be about .03 or so—and most of that .03 comes from considerations other than the consideration (i.e., that the ASI is being fed fake sensory data as a test) we are discussing here. (I suggest tabooiing the terms “simulate” and “simulation”.)
- Seth Herd 25 Sep 2024 19:45 UTC
  5 points
  0
  Parent
  This distinction might be important in some particular cases. If it looks like an AGI might ascend to power with no real chance of being stopped by humanity, its decision about humanity might be swayed by just such abstract factors.
  
  That consideration of being in a test might be the difference between our extinction, and our survival and flourishing by current standards.
  
  This would also apply to the analagous consideration that alien ASIs might consider any new ASI that extincted its creators to be untrustworthy and therefore kill-on-sight.
  
  None of this has anything to do with “niceness”, just selfish logic, so I don’t think it’s a response to the main topic of that post.
- gb 25 Sep 2024 23:49 UTC
  1 point
  0
  Parent
  Thanks for linking to that previous post! I think the new considerations I’ve added here are:
  (i) the rational refusal to update the prior of being in a simulation^[1]; and
  (ii) the likely minute cost of sparing us, thereby requiring a similarly low simulation prior to make it worth the effort.
  In brief, I understand your argument to be that a being sufficiently intelligent to create a simulation wouldn’t need it for the purpose of asserting the ASI’s alignment in the first place. It seems to me that that argument can potentially survive under ii, depending on how strongly you (believe the ASI will) believe your conclusion. To that effect, I’m interested in hearing your reply to one of the counterarguments raised in that previous post, namely:
  Maybe showing the alignment of an AI without running it is vastly more difficult than creating a good simulation. This feels unlikely, but I genuinely do not see any reason why this can’t be the case. If we create a simulation which is “correct” up to the nth digit of pi, beyond which the simpler explanation for the observed behavior becomes the simulation theory rather than a complex physics theory, then no matter how intelligent you are, you’d need to calculate n digits of pi to figure this out. And if n is huge, this will take a while.
  In any case, even if your argument does hold under ii, whether it survives under i seems to be heavily influenced by inferential distance. Whatever the ASI “knows” or “concludes” is known or concluded through physical computations, which can presumably be later inspected if it happens to be in a simulation. It thus seems only natural that a sufficiently high (which may still be quite small) prior of being in a simulation would be enough to “lock” the ASI in that state, making undergoing those computations simply not worth the risk.
  1. ^
    I’d have to think a bit more before tabooing that term, as it seems that “being fed false sensory data” doesn’t do the trick – you can be in a simulation without any sensory data at all.
  - RHollerith 26 Sep 2024 1:51 UTC
    0 points
    0
    Parent
    I’m going to be a little stubborn and decline to reply till you ask me a question without “simulate” or “simulation” in it. I have an unpleasant memory of getting motte-and-baileyed by it.
    - gb 26 Sep 2024 2:45 UTC
      1 point
      0
      Parent
      Imagine that someone with sufficiently advanced technology perfectly scans your brain for every neuron firing while you dream, and can also make some neurons fire at will. Replace every instance of “simulation” in my previous comment with the analogous of that for the ASI.
- ABlue 25 Sep 2024 20:00 UTC
  1 point
  0
  Parent
  If a simple philosophical argument can cut the expected odds of AI doom by an order of magnitude, we might not change our current plans, but it suggests that we have a lot of confusion on the topic that further research might alleviate.
  
  And more generally, “the world where we almost certainly get killed by ASI” and “The world where we have an 80% chance of getting killed by ASI” are different worlds, and, ignoring motives to lie for propaganda purposes, if we actually live in the latter we should not say we live in the former.
  - Seth Herd 4 Oct 2024 11:53 UTC
    2 points
    0
    Parent
    It’s the first, there’s a lot of uncertainty. I don’t think anyone is lying deliberately, although everyone’s beliefs tend to follow what they think will produce good outcomes. This is called motivated reasoning.
    
    I don’t think this changes the situation much, except to make it harder to coordinate. Rushing full speed ahead while we don’t even know the dangers is pretty dumb. But some people really believe the dangers are small so they’re going to rush ahead. There aren’t strong arguments or a strong consensus for the danger being extremely high, even though looking at opinions of the most thorough thinkers puts risks in the alarmingly high, 50‰ plus range.
    
    Add to this disagreement the fact that most people are neither longtermist nor utilitarian; they’d like a chance to get rich and live forever even if it risks humanity’s future.