quila comments on You can, in fact, bamboozle an unaligned AI into sparing your life

quila 29 Sep 2024 23:02 UTC
4 points
0
(This comment is tangential to the decision-theoretic focus of the post)
The AI stabilizes the situation in the world and makes sure no other dangerous AI is built, but otherwise it doesn’t harm the humans.^[6] Then it modifies its own code to have a commitment never to harm the humans, and let them live freely on Earth for at least a billion years, only doing the minimal necessary interventions to prevent humanity from wiping itself out with some new stupid technology. Crucially, the AI should do this self-modification^[7] before it makes itself very substantially smarter or better-informed about the world, to the level that it can expect to determine whether it’s in a simulation run by a very advanced future civilization.
I don’t know of consistent human values which would ask for this specifically. Consider two cases^[1]:
1. You value something like continuation of {with a bunch of complex criteria}, not quantity of copies of, at least one ‘earth society’.
  - In this case, it continues regardless some of the time, conditional on the universe being large or duplicitous enough to contain many copies of you / conditional on the premise in the post that at least some aligned ASIs will exist somewheres.
2. Instead, you linearly value a large number of copies of earth civilizations existing or something.
  - then the commitment wouldn’t be to let-continue just each one earth per unaligned ASI, but to create more, and not cap them at a billion years.^[1]
I think this is a case of humans having a deep intuition that there is only one instance of them, while also believing theory that implies otherwise, and not updating that ‘deep intuition’ while applying the theory even as it updates other beliefs (like the possibility for aligned ASIs from some earths to influence unaligned ones from other earths).
1. ^
  (to be clear, I’m not arguing for (1) or (2), and of course these are not the only possible things one can value, please do not clamp your values just because the only things humans seem to write about caring about are constrained)
What links here?
- quila's comment on Sideloading: creating a model of a person via LLM with very large prompt by avturchin (22 Nov 2024 19:34 UTC; 5 points)
- David Matolcsi 29 Sep 2024 23:17 UTC
  4 points
  2
  Parent
  I actually think that you are probably right, and in the last year I got more sympathetic to total utilitarianism because of coherence arguments like this. It’s just that the more common-sense factions still hold way more than one in a hundred million seats in my moral parliament, so it still feels like an obviously good deal to give up on some planets in the future to satisfy our deep intuitions about wanting Earth society to survive in the normal way. I agree it’s all confusing an probably incoherent, but I’m afraid every moral theory will end up somewhat incoherent in the end. (Like infinite ethics is rough.)