Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 9 May 2020 10:20 UTC
LW: 3 AF: 2
AF
Actually, as opposed to what I claimed before, we don’t need computational complexity bounds for this definition to make sense. This is because the Solomonoff prior is made of computable hypotheses but is uncomputable itself.

Given $g > 0$ , we define that ” $π$ has (unbounded) goal-directed intelligence (at least) $g$ ” when there is a prior $ζ$ and utility function $U$ s.t. for any policy $π^{'}$ , if $E_{ζ π^{'}} [U] \geq E_{ζ π} [U]$ then $K (π^{'}) \geq D_{K L} (ζ_{0} | | ζ) + K (U) + g$ . Here, $ζ_{0}$ is the Solomonoff prior and $K$ is Kolmogorov complexity. When $g = + \infty$ (i.e. no computable policy can match the expected utility of $π$ ; in particular, this implies $π$ is optimal since any policy can be approximated by a computable policy), we say that $π$ is “perfectly (unbounded) goal-directed”.

Compare this notion to the Legg-Hutter intelligence measure. The LH measure depends on the choice of UTM in radical ways. In fact, for some UTMs, AIXI (which is the maximum of the LH measure) becomes computable or even really stupid. For example, it can always keep taking the same action because of the fear that taking any other action leads to an inescapable “hell” state. On the other hand, goal-directed intelligence differs only by $O (1)$ between UTMs, just like Kolmogorov complexity. A perfectly unbounded goal-directed policy has to be uncomputable, and the notion of which policies are such doesn’t depend on the UTM at all.

I think that it’s also possible to prove that intelligence is rare, in the sense that, for any computable stochastic policy, if we regard it as a probability measure over deterministic policies, then for any $ϵ > 0$ there is $g$ s.t. the probability to get intelligence at least $g$ is smaller than $ϵ$ .

Also interesting is that, for bounded goal-directed intelligence, increasing the prices can only decrease intelligence by $O (1)$ , and a policy that is perfectly goal-directed w.r.t. lower prices is also such w.r.t. higher prices (I think). In particular, a perfectly unbounded goal-directed policy is perfectly goal-directed for any price vector. Informally speaking, an agent that is very smart relatively to a context with cheap computational resources is still very smart relatively to a context where they are expensive, which makes intuitive sense.

If we choose just one computational resource, we can speak of the minimal price for which a given policy is perfectly goal-directed, which is another way to measure intelligence with a more restricted domain. Curiously, our bounded Solomonoff-like prior has the shape of a Maxwell-Boltzmann distribution in which the prices are thermodynamic parameters. Perhaps we can regard the minimal price as the point of a phase transition.
What links here?
- Vanessa Kosoy 21 Sep 2020 14:15 UTC
  LW: 5 AF: 1
  AF Parent
  Some problems to work on regarding goal-directed intelligence. Conjecture 5 is especially important for deconfusing basic questions in alignment, as it stands in opposition to Stuart Armstrong’s thesis about the impossibility to deduce preferences from behavior alone.
  1. Conjecture. Informally: It is unlikely to produce intelligence by chance. Formally: Denote $Π$ the space of deterministic policies, and consider some $μ \in Δ Π$ . Suppose $μ$ is equivalent to a stochastic policy $π^{*}$ . Then, $E_{π \sim μ} [g (π)] = O (C (π^{*}))$ .
  2. Find an “intelligence hierarchy theorem”. That is, find an increasing sequence ${g_{n}}$ s.t. for every $n$ , there is a policy with goal-directed intelligence in $(g_{n}, g_{n + 1})$ (no more and no less).
  3. What is the computational complexity of evaluating $g$ given (i) oracle access to the policy or (ii) description of the policy as a program or automaton?
  4. What is the computational complexity of producing a policy with given $g$ ?
  5. Conjecture. Informally: Intelligent agents have well defined priors and utility functions. Formally: For every $(U, ζ)$ with $C (U) < \infty$ and $D_{K L} (ζ_{0} | | ζ) < \infty$ , and every $ϵ > 0$ , there exists $g \in (0, \infty)$ s.t. for every policy $π$ with intelligence at least $g$ w.r.t. $(U, ζ)$ , and every $(~ U, ~ ζ)$ s.t. $π$ has intelligence at least $g$ w.r.t. them, any optimal policies $π^{*}, {~ π}^{*}$ for $(U, ζ)$ and $(~ U, ~ ζ)$ respectively satisfy $E_{ζ {~ π}^{*}} [U] \geq E_{ζ π^{*}} [U] - ϵ$ .
  What links here?
  - Vanessa Kosoy's comment on What to do with imitation humans, other than asking them what the right thing to do is? by Charlie Steiner (28 Sep 2020 15:58 UTC; 4 points)
  - Davidmanheim 5 Jan 2021 7:58 UTC
    LW: 2 AF: 1
    AF Parent
    re: #5, that doesn’t seem to claim that we can infer U given their actions, which is what the impossibility of deducing preferences is actually claiming. That is, assuming 5, we still cannot show that there isn’t some $U_{1} \neq U_{2}$ such that $π^{*} (U_{1}, ζ) = π^{*} (U_{2}, ζ)$ .
    
    (And as pointed out elsewhere, it isn’t Stuart’s thesis, it’s a well known and basic result in the decision theory / economics / philosophy literature.)
    - Vanessa Kosoy 11 Jan 2021 16:44 UTC
      LW: 2 AF: 1
      AF Parent
      
      re: #5, that doesn’t seem to claim that we can infer U given their actions, which is what the impossibility of deducing preferences is actually claiming.
      
      You misunderstand the intent. We’re talking about inverse reinforcement learning. The goal is not necessarily inferring the unknown $U$ , but producing some behavior that optimizes the unknown $U$ . Ofc if the policy you’re observing is optimal then it’s trivial to do so by following the same policy. But, using my approach we might be able to extend it into results like “the policy you’re observing is optimal w.r.t. certain computational complexity, and your goal is to produce an optimal policy w.r.t. higher computational complexity.”
      
      (Btw I think the formal statement I gave for 5 is false, but there might be an alternative version that works.)
      
      (And as pointed out elsewhere, it isn’t Stuart’s thesis, it’s a well known and basic result in the decision theory / economics / philosophy literature.)
      
      I am referring to this and related work by Armstrong.