Even if we can’t infer human preferences over very distant objects, we might be able to infer human preferences well enough to guide a process of deliberation (real or hypothetical). Using the inferred preferences of the human could help eliminate some of the errors that a human would traditionally make during deliberation.
This assumes (depends on) that human deliberation is good/safe because humans have good preferences about deliberation. But what if human deliberation is only good/safe because of the constraints that we face? Example of what I mean: Someone wants to self-modify to have 100% certainty that God exists before doing further deliberation, but can’t, and as a result eventually realizes through deliberation that having 100% certainty that God exists is actually not a good idea.
This can be considered an instance of the more general concern I have about humans not being safe, especially under distributional shift.
ETA: Here is another example from our own community:
Meanwhile, a few years ago when I first learned about the concept of updatelessness, I resolved to be updateless from that point onwards. I am now glad that I couldn’t actually commit to anything then.
This assumes (depends on) that human deliberation is good/safe because humans have good preferences about deliberation. But what if human deliberation is only good/safe because of the constraints that we face? Example of what I mean: Someone wants to self-modify to have 100% certainty that God exists before doing further deliberation, but can’t, and as a result eventually realizes through deliberation that having 100% certainty that God exists is actually not a good idea.
This can be considered an instance of the more general concern I have about humans not being safe, especially under distributional shift.
ETA: Here is another example from our own community: