Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 21 Sep 2020 14:15 UTC
LW: 5 AF: 1
0
AF
Some problems to work on regarding goal-directed intelligence. Conjecture 5 is especially important for deconfusing basic questions in alignment, as it stands in opposition to Stuart Armstrong’s thesis about the impossibility to deduce preferences from behavior alone.
1. Conjecture. Informally: It is unlikely to produce intelligence by chance. Formally: Denote $Π$ the space of deterministic policies, and consider some $μ \in Δ Π$ . Suppose $μ$ is equivalent to a stochastic policy $π^{*}$ . Then, $E_{π \sim μ} [g (π)] = O (C (π^{*}))$ .
2. Find an “intelligence hierarchy theorem”. That is, find an increasing sequence ${g_{n}}$ s.t. for every $n$ , there is a policy with goal-directed intelligence in $(g_{n}, g_{n + 1})$ (no more and no less).
3. What is the computational complexity of evaluating $g$ given (i) oracle access to the policy or (ii) description of the policy as a program or automaton?
4. What is the computational complexity of producing a policy with given $g$ ?
5. Conjecture. Informally: Intelligent agents have well defined priors and utility functions. Formally: For every $(U, ζ)$ with $C (U) < \infty$ and $D_{K L} (ζ_{0} | | ζ) < \infty$ , and every $ϵ > 0$ , there exists $g \in (0, \infty)$ s.t. for every policy $π$ with intelligence at least $g$ w.r.t. $(U, ζ)$ , and every $(~ U, ~ ζ)$ s.t. $π$ has intelligence at least $g$ w.r.t. them, any optimal policies $π^{*}, {~ π}^{*}$ for $(U, ζ)$ and $(~ U, ~ ζ)$ respectively satisfy $E_{ζ {~ π}^{*}} [U] \geq E_{ζ π^{*}} [U] - ϵ$ .
What links here?
- Vanessa Kosoy's comment on What to do with imitation humans, other than asking them what the right thing to do is? by Charlie Steiner (28 Sep 2020 15:58 UTC; 4 points)
- Davidmanheim 5 Jan 2021 7:58 UTC
  LW: 2 AF: 1
  0
  AF Parent
  re: #5, that doesn’t seem to claim that we can infer U given their actions, which is what the impossibility of deducing preferences is actually claiming. That is, assuming 5, we still cannot show that there isn’t some $U_{1} \neq U_{2}$ such that $π^{*} (U_{1}, ζ) = π^{*} (U_{2}, ζ)$ .
  
  (And as pointed out elsewhere, it isn’t Stuart’s thesis, it’s a well known and basic result in the decision theory / economics / philosophy literature.)
  - Vanessa Kosoy 11 Jan 2021 16:44 UTC
    LW: 2 AF: 1
    0
    AF Parent
    
    re: #5, that doesn’t seem to claim that we can infer U given their actions, which is what the impossibility of deducing preferences is actually claiming.
    
    You misunderstand the intent. We’re talking about inverse reinforcement learning. The goal is not necessarily inferring the unknown $U$ , but producing some behavior that optimizes the unknown $U$ . Ofc if the policy you’re observing is optimal then it’s trivial to do so by following the same policy. But, using my approach we might be able to extend it into results like “the policy you’re observing is optimal w.r.t. certain computational complexity, and your goal is to produce an optimal policy w.r.t. higher computational complexity.”
    
    (Btw I think the formal statement I gave for 5 is false, but there might be an alternative version that works.)
    
    (And as pointed out elsewhere, it isn’t Stuart’s thesis, it’s a well known and basic result in the decision theory / economics / philosophy literature.)
    
    I am referring to this and related work by Armstrong.