Rob Bensinger comments on AGI Ruin: A List of Lethalities

Rob Bensinger 19 Jun 2022 7:53 UTC
2 points
−2
You don’t have to convince me of “‘any easily human-specifiable task’ is asking for a really mature alignment”, because in my model this is basically equivalent to fully solving the hard problem of AI alignment.
This seems very implausible to me. One task looks something like “figure out how to get an AGI to think about physics within a certain small volume of space, output a few specific complicated machines in that space, and not think about or steer the rest of the world”.
The other task looks something like “solve all of human psychology and moral philosophy, figure out how to get an AGI to do arbitrarily specific tasks across arbitrary cognitive domains with unlimited capabilities and free reign over the universe, and optimize the entire future light cone with zero opportunity to abort partway through if you screw anything up”.
The first task can be astoundingly difficult and still be far easier than that.
I don’t see how you can do “melt the GPUs” without having an AI that models humans.
If you’re on the Moon, on Mars, deep in the Earth’s crust, etc., or if you’ve used AGI to build fast-running human whole-brain emulations, then you can go without AGI-assisted modeling like that for a very long time (and potentially indefinitely). None of the pivotal acts that seem promising to me involve any modeling of humans, beyond the level of modeling needed to learn a specific simple physics task like ‘build more advanced computing hardware’ or ‘build an artificial ribosome’.
What if global public opinion among scientists turns against you
If humanity has solved the weak alignment problem, escaped imminent destruction via AGI proliferation, and ended the acute existential risk period, then we can safely take our time arguing about what to do next, hashing out whether the pivotal act that prevented the death of humanity violated propriety, etc. If humanity wants to take twenty years to hash out that argument, or for that matter a hundred years, then go wild!
I feel optimistic about the long-term capacity of human civilization to figure things out, grow into maturity, and eventually make sane choices about the future, if we don’t destroy ourselves. I’m much more concerned with the “let’s not destroy ourselves” problem than with the finer points of PR and messaging when it comes to discussing afterwards whatever it was someone did to prevent our imminent deaths. Humanity will have time to sort that out, if someone does successfully save us all.
a small organization going rogue
One small messaging point, though: not destroying the world isn’t “going rogue”. Destroying the world is “going rogue”. If you’re advancing AGI, the non-rogue option, the prosocial thing to do, is the thing that prevents the world from dying, not the thing that increases the probability of everyone dying.
Or, if we’re going to call ‘killing everyone’ “not going rogue”, and ‘preventing the non-rogues from killing everyone’ “going rogue”, then let’s at least be clear on the fact that going rogue is the obviously prosocial thing to do, and not going rogue (“building AGI with no remotely reasonable plan to effect pivotal outcomes”) is omnicidal and not a good idea.
- Chris van Merwijk 19 Jun 2022 10:05 UTC
  1 point
  0
  Parent
  I think I communicated unclearly and it’s my fault, sorry for that: I shouldn’t have used the phrase “any easily specifiable task” for what I meant, because I didn’t mean it to include “optimize the entire human lightcone w.r.t. human values”. In fact, I was being vague and probably there isn’t really a sensible notion that I was trying to point to. However, to clarify what I really was trying to say: What I mean by “hard problem of alignment” is : “develop an AI system that keeps humanity permanently safe from misaligned AI (and maybe other x risks), and otherwise leaves humanity to figure out what it wants and do what it wants without restricting it in much of any way except some relatively small volume of behaviour around ‘things that cause existential catastrophe’ ” (maybe this ends up being to develop a second version AI that then gets free reign to optimize the universe w.r.t. human values, but I’m a bit skeptical). I agree that “solve all of human psychology and moral …” is significantly harder than that (as a technical problem). (maybe I’d call this the “even harder problem”).
  Ehh, maybe I am changing my mind and also agree that even what I’m calling the hard problem is significantly more difficult than the pivotal act you’re describing, if you can really do it without modelling humans, by going to mars and doing WBE. But then still the whole thing would have to rely on the WBE, and I find it implausible to do it without it (currently, but you’ve been updating me about lack of need of human modelling so maybe I’ll update here too). Basically the pivotal act is very badly described as merely “melt the gpus”, and is much more crazy than what I thought it was meant to refer to.
  Regarding “rogue”: I just looked up the meaning and I thought it meant “independent from established authority”, but it seems to mean “cheating/dishonest/mischievous”, so I take back that statement about rogueness.
  I’ll respond to the “public opinion” thing later.