So8res comments on You can, in fact, bamboozle an unaligned AI into sparing your life

So8res 1 Oct 2024 20:23 UTC
3 points
−3

You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives.

My first claim is not “fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully”.

My first claim is more like “given a population of humans that doesn’t even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you’d need to spend quite a lot of bits of optimization to tune the butterfly-effects in the background particles to make that same population instead solve alignment (depending how far back in time you go).” (A very rough rule of thumb here might be “it should take about as many bits as it takes to specify an FAI (relative to what they know)”.)

This is especially stark if you’re trying to find a branch of reality that survives with the “same people” on it. Humans seem to be very, very sensitive about what counts as the “same people”. (e.g., in August, when gambling on who gets a treat, I observed a friend toss a quantum coin, see it come up against them, and mourn that a different person—not them—would get to eat the treat.)

(Insofar as y’all are trying to argue “those MIRI folk say that AI will kill you, but actually, a person somewhere else in the great quantum multiverse, who has the same genes and childhood as you but whose path split off many years ago, will wake up in a simulation chamber and be told that they were rescued by the charity of aliens! So it’s not like you’ll really die”, then I at least concede that that’s an easier case to make, although it doesn’t feel like a very honest presentation to me.)

Conditional on observing a given population of humans coming nowhere close to solving the problem, the branches wherein those humans live (with identity measured according to the humans) are probably very extremely narrow compared to the versions where they die. My top guess would be that 2^-75 number is a vast overestimate of how thick those branches are (and the 75 in the exponent does not come from any attempt of mine to make that estimate).

As I said earlier: you can take branches that branched off earlier and earlier in time, and they’ll get better and better odds. (Probably pretty drastically, as you back off past certain points of no return. I dunno where the points of no return are. Weeks? Months? Years? Not decades, because with decades you can reroll significant portions of the population.)

I haven’t thought much about what fraction of populations I’d expect to survive off of what branch-point. (How many bits of optimization do you need back in the 1880s to swap Hitler out for some charismatic science-enthusiast statesman that will happen to have exactly the right infulence on the following culture? How many such routes are there? I have no idea.)

Three big (related) issues with hoping that forks branced off sufficiently early (who are more numerous) save us in particular (rather than other branches) are (a) they plausibly care more about populations nearer to them (e.g. versions of themselves that almost died); (b) insofar as they care about more distant populations (that e.g. include you), they have rather a lot of distant populations to attempt to save; and (c) they have trouble distinguishing populations that never were, from populations that were and then weren’t.

Point (c) might be a key part of the story, not previously articulated (that I recall), that you were missing?

Like, you might say “well, if one in a billion branches look like dath ilan and the rest look like earth, and the former basically all survive and the latter basically all die, then the fact that the earthlike branches have ~0 ability to save their earthlike kin doesn’t matter, so long as the dath-ilan like branches are trying to save everyone. dath ilan can just flip 30 quantum coins to select a single civilization from among the billion that died, and then spend 1/million resources on simulating that civilization (or paying off their murderer or whatever), and that still leaves us with one-in-a-quintillion fraction of the universe, which is enough to keep the lights running”.

Part of the issue with this is that dath ilan cannot simply sample from the space of dead civilizations; it has to sample from a space of plausible dead civilizations rather than actual dead civilizations, in a way that I expect to smear loads and loads of probability-mass over regions that had concentrated (but complex) patterns of amplitude. The concentrations of Everett branches are like a bunch of wiggly thin curves etched all over a disk, and it’s not too hard to sample uniformly from the disk (and draw a plausible curve that the point could have been on), but it’s much harder to sample only from the curves. (Or, at least, so the physics looks to me. And this seems like a common phenomenon in physics. c.f. the apparent inevitable increase of entropy when what’s actually happening is a previously-compact volume in phase space evolving int oa bunch of wiggly thin curves, etc.)

So when you’re considering whether surviving humans will pay for our souls—not somebody’s souls, but our souls in particular—you have a question of how these alleged survivors came to pay for us in particular (rather than some other poor fools). And there’s a tradeoff that runs on one exrteme from “they’re saving us because they are almost exactly us and they remember us and wish us to have a nice epilog” all the way to “they’re some sort of distant cousins, branched off a really long time ago, who are trying to save everyone”.

The problem with being on the “they care about us because they consider they basically are us” end is that those people are dead to (conditional on us being dead). And as you push the branch-point earlier and earlier in time, you start finding more survivors, but those survivors also wind up having more and more fools to care about (in part because they have trouble distinguishing the real fallen civilizations from the neighboring civilization-configurations that don’t get appreciable quantum amplitude in basement physics).

If you tell me where on this tradeoff curve you want to be, we can talk about it. (Ryan seemed to want to look all the way on the “insurance pool with aliens” end of the spectrum.)

The point of the 2^75 number is that that’s about the threshold of “can you purchase a single star”. My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us.

If we retreat to “distant cousin branches of humanity might save us”, there’s a separate question of how the width of the surviving quantum branch compares to the volume taken up by us in the space of civilizations they attempt to save. I think my top guess is that a distant branch of humanity, spending stellar-level resources in attempts to concentrate its probability-mass in accordance with how quantum physics concentrates (squared) amplitude, still winds up so uncertain that there’s still 50+ bits of freedom left over? Which means that if one-in-a-billion of our cousin-branches survives, they still can’t buy a star (unless I flubbed my math).

And I think it’s real, real easy for them to wind up with 1000 bits leftover, in which case their purchasing power is practically nothing.

(This actually seems like a super reasonable guess to me. Like, if you imagine knowing that a mole of gas was compressed into the corner of a box with known volume, and you then let the gas bounce around for 13 billion years and take some measurements of pressure and temperature, and then think long and hard using an amount of compute that’s appreciably less than the amount you’d need to just simulate the whole thing from the start. It seems to me like you wind up with a distribution that has way way more than 1000 bits more entropy than is contained in the underlying physics. Imagining that you can spend about 1 ten millionth of the universe on refining a distribution over Tegmark III with entropy that’s within 50 bits of god seems very very generous to me; I’m very uncertain about this stuff but I think that even mature superintelligences could easily wind up 1000 bits from god here.)

Regardless, as I mentioned elsewhere, I think that a more relevant question is how those trade-offers stack up to other trade-offers, so /shrug.
- David Matolcsi 1 Oct 2024 20:55 UTC
  3 points
  0
  Parent
  I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan’s comments in this thread arguing that it’s incompatible to believe that “My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us” and to believe that you should work on AI safety instead of malaria.