HonoreDB comments on Formalizing Value Extrapolation

HonoreDB 26 Apr 2012 3:59 UTC
3 points
I’m slightly worried that even formally specifying an “idealized and unbounded computer” will turn out to be Oracle-AI-complete. We don’t need to worry about it converting something valuable into computronium, but we do need to ensure that it interacts with the simulated human(s) in a friendly way. We need to ensure that it doesn’t modify the human to simplify the process of explaining something. The simulated human needs to be able to control what kinds of minds the computer creates in the process of thinking (we may not care, but the human would). And the computer should certainly not hack its way out of the hypothetical via being thought about by the FAI.
- paulfchristiano 26 Apr 2012 5:06 UTC
  9 points
  Parent
  We are trying to formally specify the input-output behavior of an idealized computer, running some simple program. The mathematical definition of a Turing machine with an input tape would suffice, as would a formal specification of a version of Python running with unlimited memory.
  - HonoreDB 26 Apr 2012 15:32 UTC
    3 points
    Parent
    Okay, I see that that’s what you’re saying. The assumption then (which seems reasonable but needs to be proven?) is that the simulated humans, given infinite resources, would either solve Oracle AI [edit: without accidentally creating uFAI first, I mean] or just learn how to do stuff like create universes themselves.
    
    There is still the issue that a hypothetical human with access to infinite computing power would not want to create or observe hellworlds. We here in the real world don’t care, but the hypothetical human would. So I don’t think your specific idea for brute-force creating an Earth simulation would work, because no moral human would do it.