Vanessa Kosoy answers What’s a Decomposable Alignment Topic?

Vanessa Kosoy 22 Aug 2020 12:27 UTC
LW: 5 AF: 4
AF
The idea is an elaboration of a comment I made previously.

Motivation: Improving the theoretical understanding of AGI by facilitating synthesis between algorithmic information theory and statistical learning theory.

Topic: Fix some reasonable encoding of communicating MDPs, and use this encoding to define $ζ_{CMDP}$ : the Solomonoff-type prior over communicating MDPs. That is, the probability of a communicating MDP $H$ is proportional to $2^{- K (H)}$ where $K (H)$ is the length of the shortest program producing the encoding of $H$ .

Consider CMDP-AIXI: the Bayes-optimal agent for $ζ_{CMDP}$ . Morally speaking, we would like to prove that CMDP-AIXI (or any other policy) has a frequentist (i.e. per hypothesis $H$ ) non-anytime regret bound of the form $O (n^{α} ζ_{CMDP} (H)^{- β} τ (H)^{γ})$ , where $n$ is the time horizon^[1], $τ (H)$ is a parameter such as MDP diameter, bias span or mixing time, $α \in (0, 1)$ , $β, γ \in (0, \infty)$ (this time $γ$ is just a constant, not time discount). However, this precise result is probably impossible, because the Solomonoff prior falls off very slowly. Warm-up: Prove this!

Next, we need the concept of “sophisticated core”, inspired by algorithmic statistics. Given a bit string $x$ , we consider the Kolmogorov complexity $K (x)$ of $x$ . Then we consider pairs $(Q, y)$ where $Q$ is a program that halts on all inputs, $y$ is a bit string, $Q (y) = x$ and $| Q | + | y | \leq K (x) + O (1)$ . Finally, we minimize over $| Q |$ . The minimal $| Q |$ is called the sophistication of $x$ . For our problem, we are interested in the minimal $Q$ itself: I call it the “sophisticated core” of $x$ and denote it $S C (x)$ .

To any halting program $Q$ we can associate the environment $μ_{Q} := E_{H \sim ζ_{CMDP}} [H ∣ S C (H) = Q]$ . We also define the prior $ξ$ by $ξ (Q) := {Pr}_{H \sim ζ_{CMDP}} [S C (H) = Q]$ . $ζ$ and $ξ$ are “equivalent” in the sense that $E_{Q \sim ξ} [μ_{Q}] = E_{H \sim ζ_{CMDP}} [H]$ . However, they are not equivalent for the purpose of regret bounds.

Challenge: Investigate the conjecture that there is a ( $n$ -dependent) policy satisfying the regret bound $O (n^{α} ξ (Q)^{- β} E {[τ (μ_{Q})]}_{Q}^{γ})$ for every $μ_{Q}$ , or something similar.

Strategy: See the theoretical research part of my other answer.
1. ↩︎
  I am using unnormalized regret and step-function time discount here to make the notation more standard, even though usually I prefer normalized regret and geometric time discount.
What links here?