TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 28 Apr 2020 19:23 UTC
LW: 4 AF: 2
AF
We can imagine aliens building a superintelligent agent which helps them get what they want. This is a special case of aliens inventing tools. What kind of general process should these aliens use – how should they go about designing such an agent?

Assume that these aliens want things in the colloquial sense (not that they’re eg nontrivially VNM EU maximizers) and that a reasonable observer would say they’re closer to being rational than antirational. Then it seems^[1] like these aliens eventually steer towards reflectively coherent rationality (provided they don’t blow themselves to hell before they get there): given time, they tend to act to get what they want, and act to become more rational. But, they aren’t fully “rational”, and they want to build a smart thing that helps them. What should they do?

In this situation, it seems like they should build an agent which empowers them & increases their flexible control over the future, since they don’t fully know what they want now. Lots of flexible control means they can better error-correct and preserve value for what they end up believing they actually want. This also protects them from catastrophe and unaligned competitor agents.
1. ↩︎
  I don’t know if this is formally and literally always true, I’m just trying to gesture at an intuition about what kind of agentic process these aliens are.
What links here?
- Non-Obstruction: A Simple Concept Motivating Corrigibility by TurnTrout (21 Nov 2020 19:35 UTC; 74 points)
- TurnTrout's comment on AvE: Assistance via Empowerment by FactorialCode (1 Jul 2020 3:24 UTC; 4 points)