Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 28 Sep 2020 16:41 UTC
LW: 10 AF: 7
AF
Master post for ideas about infra-Bayesianism.
- Vanessa Kosoy 2 Oct 2020 16:54 UTC
  LW: 8 AF: 4
  AF Parent
  In the anthropic trilemma, Yudkowsky writes about the thorny problem of understanding subjective probability in a setting where copying and modifying minds is possible. Here, I will argue that infra-Bayesianism (IB) leads to the solution.
  
  Consider a population of robots, each of which in a regular RL agent. The environment produces the observations of the robots, but can also make copies or delete portions of their memories. If we consider a random robot sampled from the population, the history they observed will be biased compared to the “physical” baseline. Indeed, suppose that a particular observation $c$ has the property that every time a robot makes it, 10 copies of them are created in the next moment. Then, a random robot will have $c$ much more often in their history than the physical frequency with which $c$ is encountered, due to the resulting “selection bias”. We call this setting “anthropic RL” (ARL).
  
  The original motivation for IB was non-realizability. But, in ARL, Bayesianism runs into issues even when the environment is realizable from the “physical” perspective. For example, we can consider an “anthropic MDP” (AMDP). An AMDP has finite sets of actions ( $A$ ) and states ( $S$ ), and a transition kernel $T : A \times S \to Δ (S^{*})$ . The output is a string of states instead of a single state, because many copies of the agent might be instantiated on the next round, each with their own state. In general, there will be no single Bayesian hypothesis that captures the distribution over histories that the average robot sees at any given moment of time (at any given moment of time we sample a robot out of the population and look at their history). This is because the distributions at different moments of time are mutually inconsistent.
  
  [EDIT: Actually, given that we don’t care about the order of robots, the signature of the transition kernel should be $T : A \times S \to Δ N^{S}$ ]
  
  The consistency that is violated is exactly the causality property of environments. Luckily, we know how to deal with acausality: using the IB causal-acausal correspondence! The result can be described as follows: Murphy chooses a time moment $n \in N$ and guesses the robot policy $π$ until time $n$ . Then, a simulation of the dynamics of $(π, T)$ is performed until time $n$ , and a single history is sampled from the resulting population. Finally, the observations of the chosen history unfold in reality. If the agent chooses an action different from what is prescribed, Nirvana results. Nirvana also happens after time $n$ (we assume Nirvana reward $1$ rather than $\infty$ ).
  
  This IB hypothesis is consistent with what the average robot sees at any given moment of time. Therefore, the average robot will learn this hypothesis (assuming learnability). This means that for $n ≫ \frac{1}{1 - γ} ≫ 0$ , the population of robots at time $n$ has expected average utility with a lower bound close to the optimum for this hypothesis. I think that for an AMDP this should equal the optimum expected average utility you can possibly get, but it would be interesting to verify.
  
  Curiously, the same conclusions should hold if we do a weighted average over the population, with any fixed method of weighting. Therefore, the posterior of the average robot behaves adaptively depending on which sense of “average” you use. So, your epistemology doesn’t have to fix a particular method of counting minds. Instead different counting methods are just different “frames of reference” through which to look, and you can be simultaneously rational in all of them.
  What links here?
  - Long-Term Future Fund: November 2020 grant recommendations by Habryka (EA Forum; 3 Dec 2020 12:57 UTC; 76 points)
  - Charlie Steiner 26 Dec 2020 8:50 UTC
    LW: 2 AF: 1
    AF Parent
    Could you expand a little on why you say that no Bayesian hypothesis captures the distribution over robot-histories at different times? It seems like you can unroll an AMDP into a “memory MDP” that puts memory information of the robot into the state, thus allowing Bayesian calculation of the distribution over states in the memory MDP to capture history information in the AMDP.
    - Vanessa Kosoy 26 Dec 2020 16:31 UTC
      LW: 2 AF: 1
      AF Parent
      I’m not sure what do you mean by that “unrolling”. Can you write a mathematical definition?
      
      Let’s consider a simple example. There are two states: $s_{0}$ and $s_{1}$ . There is just one action so we can ignore it. $s_{0}$ is the initial state. An $s_{0}$ robot transition into an $s_{1}$ robot. An $s_{1}$ robot transitions into an $s_{0}$ robot and an $s_{1}$ robot. How will our population look like?
      
      0th step: all robots remember $s_{0}$
      
      1st step: all robots remember $s_{0} s_{1}$
      
      2nd step: ¹⁄₂ of robots remember $s_{0} s_{1} s_{0}$ and ¹⁄₂ of robots remember $s_{0} s_{1} s_{1}$
      
      3rd step: ¹⁄₃ of robots remembers $s_{0} s_{1} s_{0} s_{1}$ , ¹⁄₃ of robots remember $s_{0} s_{1} s_{1} s_{0}$ and ¹⁄₃ of robots remember $s_{0} s_{1} s_{1} s_{1}$
      
      There is no Bayesian hypothesis a robot can have that gives correct predictions both for step 2 and step 3. Indeed, to be consistent with step 2 we must have $P r [s_{0} s_{1} s_{0}] = \frac{1}{2}$ and $P r [s_{0} s_{1} s_{1}] = \frac{1}{2}$ . But, to be consistent with step 3 we must have $P r [s_{0} s_{1} s_{0}] = \frac{1}{3}$ , $P r [s_{0} s_{1} s_{1}] = \frac{2}{3}$ .
      
      In other words, there is no Bayesian hypothesis s.t. we can guarantee that a randomly sampled robot on a sufficiently late time step will have learned this hypothesis with high probability. The apparent transition probabilities keep shifting s.t. it might always continue to seem that the world is complicated enough to prevent our robot from having learned it already.
      
      Or, at least it’s not obvious there is such a hypothesis. In this example, $\frac{P r [s_{0} s_{1} s_{1}]}{P r [s_{0} s_{1} s_{0}]}$ will converge to the golden ratio at late steps. But, do all probabilities converge fast enough for learning to happen, in general? I don’t know, maybe for finite state spaces it can work. Would definitely be interesting to check.
      
      [EDIT: actually, in this example there is such a hypothesis but in general there isn’t, see below]
      - Charlie Steiner 27 Dec 2020 1:40 UTC
        LW: 2 AF: 1
        AF Parent
        Great example. At least for the purposes of explaining what I mean :) The memory AMDP would just replace the states $s_{0}$ , $s_{1}$ with the memory states $[s_{0}]$ , $[s_{1}]$ , $[s_{0}, s_{0}]$ , $[s_{0}, s_{1}]$ , etc. The action takes a robot in $[s_{0}]$ to memory state $[s_{0}, s_{1}]$ , and a robot in $[s_{0}, s_{1}]$ to one robot in $[s_{0}, s_{1}, s_{0}]$ and another in $[s_{0}, s_{1}, s_{1}]$ .
        (Skip this paragraph unless the specifics of what’s going on aren’t obvious: given a transition distribution $P (s^{' *} | s, π)$ (P being the distribution over sets of states s’* given starting state s and policy $π$ ), we can define the memory transition distribution $P (s_{m}^{' *} | s_{m}, π)$ given policy $π$ and starting “memory state” $s_{m} \in S^{*}$ (Note that this star actually does mean finite sequences, sorry for notational ugliness). First we plug the last element of $s_{m}$ into the transition distribution as the current state. Then for each $s^{' *}$ in the domain, for each element in $s^{' *}$ we concatenate that element onto the end of $s_{m}$ and collect these $s_{m}^{'}$ into a set $s_{m}^{' *}$ , which is assigned the same probability $P (s^{' *})$ .)
        So now at time t=2, if you sample a robot, the probability that its state begins with $[s_{0}, s_{1}, s_{1}]$ is 0.5. And at time t=3, if you sample a robot that probability changes to 0.66. This is the same result as for the regular MDP, it’s just that we’ve turned a question about the history of agents, which may be ill-defined, into a question about which states agents are in.
        I’m still confused about what you mean by “Bayesian hypothesis” though. Do you mean a hypothesis that takes the form of a non-anthropic MDP?
        Vanessa Kosoy 27 Dec 2020 18:17 UTC
        LW: 2 AF: 1
        AF Parent
        I’m not quite sure what are you trying to say here, probably my explanation of the framework was lacking. The robots already remember the history, like in classical RL. The question about the histories is perfectly well-defined. In other words, we are already implicitly doing what you described. It’s like in classical RL theory, when you’re proving a regret bound or whatever, your probability space consists of histories.
        
        I’m still confused about what you mean by “Bayesian hypothesis” though. Do you mean a hypothesis that takes the form of a non-anthropic MDP?
        
        Yes, or a classical RL environment. Ofc if we allow infinite state spaces, then any environment can be regarded as an MDP (whose states are histories). That is, I’m talking about hypotheses which conform to the classical “cybernetic agent model”. If you wish, we can call it “Bayesian cybernetic hypothesis”.
        
        Also, I want to clarify something I was myself confused about in the previous comment. For an anthropic Markov chain (when there is only one action) with a finite number of states, we can give a Bayesian cybernetic description, but for a general anthropic MDP we cannot even if the number of states is finite.
        
        Indeed, consider some $T : S \to Δ N^{S}$ . We can take its expected value to get $E T : S \to R_{+}^{S}$ . Assuming the chain is communicating, $E T$ is an irreducible non-negative matrix, so by the Perron-Frobenius theorem it has a unique-up-to-scalar maximal eigenvector $η \in R_{+}^{S}$ . We then get the subjective transition kernel:
        
        $S T (t ∣ s) = \frac{E T (t ∣ s) η_{t}}{\sum_{t^{'} \in S} E T (t^{'} ∣ s) η_{t^{'}}}$
        
        Now, consider the following example of an AMDP. There are three actions $A := {a, b, c}$ and two states $S := {s_{0}, s_{1}}$ . When we apply $a$ to an $s_{0}$ robot, it creates two $s_{0}$ robots, whereas when we apply $a$ to an $s_{1}$ robot, it leaves one $s_{1}$ robot. When we apply $b$ to an $s_{1}$ robot, it creates two $s_{1}$ robots, whereas when we apply $b$ to an $s_{0}$ robot, it leaves one $s_{0}$ robot. When we apply $c$ to any robot, it results in one robot whose state is $s_{0}$ with probability $\frac{1}{2}$ and $s_{1}$ with probability $\frac{1}{2}$ .
        
        Consider the following two policies. $π_{a}$ takes the sequence of actions $c a c a c a \dots$ and $π_{b}$ takes the sequence of actions $c b c b c b \dots$ . A population that follows $π_{a}$ would experience the subjective probability $S T (s_{0} ∣ s_{0}, c) = \frac{2}{3}$ , whereas a population that follows $π_{b}$ would experience the subjective probability $S T (s_{0} ∣ s_{0}, c) = \frac{1}{3}$ . Hence, subjective probabilities depend on future actions. So, effectively anthropics produces an acausal (Newcomb-like) environment. And, we already know such environments are learnable by infra-Bayesian RL agents and, (most probably) not learnable by Bayesian RL agents.
        What links here?
        Vanessa Kosoy's comment on Vanessa Kosoy’s Shortform by Vanessa Kosoy (26 Dec 2020 16:31 UTC; 2 points)
        Charlie Steiner 27 Dec 2020 21:52 UTC
        LW: 2 AF: 1
        AF Parent
        Ah, okay, I see what you mean. Like how preferences are divisible into “selfish” and “worldly” components, where the selfish component is what’s impacted by a future simulation of you that is about to have good things happen to it.
        
        (edit: The reward function in AMDPs can either be analogous to “wordly” and just sum the reward calculated at individual timesteps, or analogous to “selfish” and calculated by taking the limit of the subjective distribution over parts of the history, then applying a reward function to the expected histories.)
        
        I brought up the histories->states thing because I didn’t understand what you were getting at, so I was concerned that something unrealistic was going on. For example, if you assume that the agent can remember its history, how can you possibly handle an environment with memory-wiping?
        
        In fact, to me the example is still somewhat murky, because you’re talking about the subjective probability of a state given a policy and a timestep, but if the agents know their histories there is no actual agent in the information-state that corresponds to having those probabilities. In an MDP the agents just have probabilities over transitions—so maybe a clearer example is an agent that copies itself if it wins the lottery having a larger subjective transition probability of going from gambling to winning. (i.e. states are losing and winning, actions are gamble and copy, the policy is to gamble until you win and then copy).
        Vanessa Kosoy 28 Dec 2020 7:15 UTC
        LW: 2 AF: 1
        AF Parent
        
        Ah, okay, I see what you mean. Like how preferences are divisible into “selfish” and “worldly” components, where the selfish component is what’s impacted by a future simulation of you that is about to have good things happen to it.
        
        ...I brought up the histories->states thing because I didn’t understand what you were getting at, so I was concerned that something unrealistic was going on. For example, if you assume that the agent can remember its history, how can you possibly handle an environment with memory-wiping?
        
        AMDP is only a toy model that distills the core difficulty into more or less the simplest non-trivial framework. The rewards are “selfish”: there is a reward function $r : (S \times A)^{*} \to R$ which allows assigning utilities to histories by time discounted summation, and we consider the expected utility of a random robot sampled from a late population. And, there is no memory wiping. To describe memory wiping we indeed need to do the “unrolling” you suggested. (Notice that from the cybernetic model POV, the history is only the remembered history.)
        
        For a more complete framework, we can use an ontology chain, but (i) instead of $A \times O$ labels use $A \times M$ labels, where $M$ is the set of possible memory states (a policy is then described by $π : M \to A$ ), to allow for agents that don’t fully trust their memory (ii) consider another chain with a bigger state space $S^{'}$ plus a mapping $p : S^{'} \to N^{S}$ s.t. the transition kernels are compatible. Here, the semantics of $p (s)$ is: the multiset of ontological states resulting from interpreting the physical state $s$ by taking the viewpoints of different agents $s$ contains.
        
        In fact, to me the example is still somewhat murky, because you’re talking about the subjective probability of a state given a policy and a timestep, but if the agents know their histories there is no actual agent in the information-state that corresponds to having those probabilities.
        
        I didn’t understand “no actual agent in the information-state that corresponds to having those probabilities”. What does it mean to have an agent in the information-state?
        Charlie Steiner 29 Dec 2020 8:54 UTC
        LW: 2 AF: 1
        AF Parent
        
        What does it mean to have an agent in the information-state?
        
        Nevermind, I think I was just looking at it with the wrong class of reward function in mind.
- Vanessa Kosoy 9 Apr 2024 11:56 UTC
  LW: 7 AF: 5
  0
  AF Parent
  Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.
  In the following, all infradistributions are crisp.
  Fix finite action set $A$ and finite observation set $O$ . For any $k \in N$ and $γ \in (0, 1)$ , let
  $M_{γ}^{k} : (A \times O)^{ω} \to Δ (A \times O)^{k}$
  be defined by
  $M_{γ}^{k} (h | d) := (1 - γ) \infty \sum n = 0 γ^{n} [[h = d_{n : n + k}]]$
  In other words, this kernel samples a time step $n$ out of the geometric distribution with parameter $γ$ , and then produces the sequence of length $k$ that appears in the destiny starting at $n$ .
  For any continuous^[1] function $D : □ (A \times O)^{k} \to R$ , we get a decision rule. Namely, this rule says that, given infra-Bayesian law $Λ$ and discount parameter $γ$ , the optimal policy is
  $π_{D Λ}^{*} := arg max π : O^{*} \to A D (M_{γ *}^{k} Λ (π))$
  The usual maximin is recovered when we have some reward function $r : (A \times O)^{k} \to R$ and corresponding to it is
  $D_{r} (Θ) := min θ \in Θ E_{θ} [r]$
  Given a set $H$ of laws, it is said to be learnable w.r.t. $D$ when there is a family of policies ${π_{γ}}_{γ \in (0, 1)}$ such that for any $Λ \in H$
  $lim γ \to 1 (max π D (M_{γ *}^{k} Λ (π)) - D (M_{γ *}^{k} Λ (π_{γ})) = 0$
  For $D_{r}$ we know that e.g. the set of all communicating^[2] finite infra-RDPs is learnable. More generally, for any $t \in [0, 1]$ we have the learnable decision rule
  $D_{r}^{t} := t max θ \in Θ E_{θ} [r] + (1 - t) min θ \in Θ E_{θ} [r]$
  This is the “mesomism” I taked about before.
  Also, any monotonically increasing $D$ seems to be learnable, i.e. any $D$ s.t. for $Θ_{1} \subseteq Θ_{2}$ we have $D (Θ_{1}) \leq D (Θ_{2})$ . For such decision rules, you can essentially assume that “nature” (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.
  On the other hand, decision rules of the form $D_{r_{1}} + D_{r_{2}}$ are not learnable in general, and so are decision rules of the form $D_{r} + D^{'}$ for $D^{'}$ monotonically increasing.
  Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?
  A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory^[3], AFAIK.
  1. ^
    We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need $D$ to be at least upper semicontinuous.
  2. ^
    There are weaker conditions than “communicating” that are sufficient, e.g. “resettable” (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.
  3. ^
    I mean theorems like VNM, Savage etc.
- Vanessa Kosoy 28 Sep 2020 17:29 UTC
  LW: 4 AF: 2
  AF Parent
  There is a formal analogy between infra-Bayesian decision theory (IBDT) and modal updateless decision theory (MUDT).
  
  Consider a one-shot decision theory setting. There is a set of unobservable states $S$ , a set of actions $A$ and a reward function $r : A \times S \to [0, 1]$ . An IBDT agent has some belief $β \in □ S$ ^[1], and it chooses the action $a^{*} := arg {max}_{a \in A} E_{β} [λ s . r (a, s)]$ .
  
  We can construct an equivalent scenario, by augmenting this one with a perfect predictor of the agent (Omega). To do so, define $S^{'} := A \times S$ , where the semantics of $(p, s)$ is “the unobservable state is $s$ and Omega predicts the agent will take action $p$ ”. We then define $r^{'} : A \times S^{'} \to [0, 1]$ by $r^{'} (a, p, s) := 1_{a = p} r (a, s) + 1_{a \neq p}$ and $β^{'} \in □ S^{'}$ by $E_{β^{'}} [f] := {min}_{p \in A} E_{β} [λ s . f (p, s)]$ ( $β^{'}$ is what we call the pullback of $β$ to $S^{'}$ , i.e we have utter Knightian uncertainty about Omega). This is essentially the usual Nirvana construction.
  
  The new setup produces the same optimal action as before. However, we can now give an alternative description of the decision rule.
  
  For any $p \in A$ , define $Ω^{p} \in □ S^{'}$ by $E_{Ω^{p}} [f] := {min}_{s \in S} f (p, s)$ . That is, $Ω^{p}$ is an infra-Bayesian representation of the belief “Omega will make prediction $p$ ”. For any $u \in [0, 1]$ , define $R^{u} \in □ S^{'}$ by $E_{R^{u}} [f] := {min}_{μ \in Δ S^{'} : E_{μ} [r (p, s)] \geq u} E_{μ} [f (p, s)]$ . $R^{u}$ can be interpreted as the belief “assuming Omega is accurate, the expected reward will be at least $u$ ”.
  
  We will also need to use the order $⪯$ on $□ X$ defined by: $ϕ ⪯ ψ$ when $\forall f \in [0, 1]^{X} : E_{ϕ} [f] \geq E_{ψ} [f]$ . The reversal is needed to make the analogy to logic intuitive. Indeed, $ϕ ⪯ ψ$ can be interpreted as ” $ϕ$ implies $ψ$ “^[2], the meet operator $\land$ can be interpreted as logical conjunction and the join operator $\lor$ can be interpreted as logical disjunction.
  
  Claim:
  
  $a^{*} = arg {max}_{a \in A} max {u \in [0, 1] ∣ β^{'} \land Ω^{a} ⪯ R^{u}}$
  
  (Actually I only checked it when we restrict to crisp infradistributions, in which case $\land$ is intersection of sets and $⪯$ is set containment, but it’s probably true in general.)
  
  Now, $β^{'} \land Ω^{a} ⪯ R^{u}$ can be interpreted as “the conjunction of the belief $β^{'}$ and $Ω^{a}$ implies $R^{u}$ ”. Roughly speaking, “according to $β^{'}$ , if the predicted action is $a$ then the expected reward is at least $u$ ”. So, our decision rule says: choose the action that maximizes the value for which this logical implication holds (but “holds” is better thought of as “is provable”, since we’re talking about the agent’s belief). Which is exactly the decision rule of MUDT!
  ↩︎
  Apologies for the potential confusion between $□$ as “space of infradistrubutions” and the $□$ of modal logic (not used in this post).
  
  ↩︎
  Technically it’s better to think of it as ” $ψ$ is true in the context of $ϕ$ ”, since it’s not another infradistribution so it’s not a genuine implication operator.
  What links here?
  - Long-Term Future Fund: November 2020 grant recommendations by Habryka (EA Forum; 3 Dec 2020 12:57 UTC; 76 points)
  - The Many Faces of Infra-Beliefs by Diffractor (6 Apr 2021 10:43 UTC; 30 points)
- Vanessa Kosoy 6 Aug 2022 5:19 UTC
  LW: 3 AF: 2
  AF Parent
  Master post for ideas about infra-Bayesian physicalism.
  
  Other relevant posts:
  - Incorrigibility in IBP
  - PreDCA alignment protocol
  - Vanessa Kosoy 13 May 2023 12:50 UTC
    LW: 2 AF: 2
    AF Parent
    Physicalist agents see themselves as inhabiting an unprivileged position within the universe. However, it’s unclear whether humans should be regarded as such agents. Indeed, monotonicity is highly counterintuitive for humans. Moreover, historically human civilization struggled a lot with accepting the Copernican principle (and is still confused about issues such as free will, anthropics and quantum physics which physicalist agents shouldn’t be confused about). This presents a problem for superimitation.
    What if humans are actually cartesian agents? Then, it makes sense to consider a variant of physicalist superimitation where instead of just seeing itself as unprivileged, the AI sees the user as a privileged agent. We call such agents “transcartesian”. Here is how this can be formalized as a modification of IBP.
    In IBP, a hypothesis is specified by choosing the state space $Φ$ and the belief $Θ \in □ (Γ \times Φ)$ . In the transcartesian framework, we require that a hypothesis is augmented by a mapping $τ : Φ \to (A_{0} \times O_{0})^{\leq ω}$ , where $A_{0}$ is the action set of the reference agent (user) and $O_{0}$ is the observation set of the reference agent. Given $G_{0}$ the source code of the reference agent, we require that $Θ$ is supported on the set
    ${(y, x) \in Γ \times Φ ∣ ∣ h a ⊑ τ (x) ⟹ a = G_{0}^{y} (h)}$
    That is, the actions of the reference agent are indeed computed by the source code of the reference agent.
    Now, instead of using a loss function of the form $L : {e l}^{Γ} \to R$ , we can use a loss function of the form $L : (A_{0} \times O_{0})^{\leq ω} \to R$ which doesn’t have to satisfy any monotonicity constraint. (More generally, we can consider hybrid loss functions of the form $L : (A_{0} \times O_{0})^{\leq ω} \times {e l}^{Γ} \to R$ monotonic in the second argument.) This can also be generalized to reference agents with hidden rewards.
    As opposed to physicalist agents, transcartesian agents do suffer from penalties associated with the description complexity of bridge rules (for the reference agent). Such an agent can (for example) come to believe in a simulation hypothesis that is unlikely from a physicalist perspective. However, since such a simulation hypothesis would be compelling for the reference agent as well, this is not an alignment problem (epistemic alignment is maintained).
    What links here?
    The Learning-Theoretic Agenda: Status 2023 by Vanessa Kosoy (19 Apr 2023 5:21 UTC; 135 points)
  - Vanessa Kosoy 4 Feb 2023 7:45 UTC
    LW: 2 AF: 2
    AF Parent
    Up to light editing, the following was written by me during the “Finding the Right Abstractions for healthy systems” research workshop, hosted by Topos Institute in January 2023. However, I invented the idea before.
    
    In order to allow $R$ (the set of programs) to be infinite in IBP, we need to define the bridge transform for infinite $Γ$ . At first, it might seem $Γ$ can be allowed to be any compact Polish space, and the bridge transform should only depend on the topology on $Γ$ , but that runs into problems. Instead, the right structure on $Γ$ for defining the bridge transform seems to be that of a “profinite field space”: a category I came up with that I haven’t seen in the literature so far.
    
    The category $P F S$ of profinite field spaces is defined as follows. An object $F$ of $P F S$ is a set $i n d (F)$ and a family of finite sets ${F_{α}}_{α \in i n d (F)}$ . We denote $T o t (F) := \prod_{α} F_{α}$ . Given $F$ and $G$ objects of $P F S$ , a morphism from $F$ to $G$ is a mapping $f : T o t (F) \to T o t (G)$ such that there exists $R \subseteq i n d (F) \times i n d (G)$ with the following properties:
    
    For any $α \in i n d (F)$ , the set $R (α) := β \in i n d (G) ∣ (α, β) \in R$ is finite.
    For any $β \in i n d (G)$ , the set $R^{- 1} (β) := α \in i n d (F) ∣ (α, β) \in R$ is finite.
    For any $β \in i n d (G)$ , there exists a mapping $f_{β} : \prod_{α \in R^{- 1} (β)} F_{α} \to G_{β}$ s.t. for any $x \in T o t (F)$ , $f (x)_{β} := f_{β} ({p r}_{β}^{R} (x))$ where ${p r}_{β}^{R} : T o t (F) \to \prod_{α \in R^{- 1} (β)} F_{α}$ is the projection mapping.
    
    The composition of $P F S$ morphisms is just the composition of mappings.
    
    It is easy to see that every $P F S$ morphism is a continuous mapping in the product topology, but the converse is false. However, the converse is true for objects with finite $i n d$ (i.e. for such objects any mapping is a morphism). Hence, an object $F$ in $P F S$ can be thought of as $T o t (F)$ equipped with additional structure that is stronger than the topology but weaker than the factorization into $F_{α}$ .
    
    The name “field space” is inspired by the following observation. Given $F$ an object of $P F S$ , there is a natural condition we can impose on a Borel probability distribution on $T o t (F)$ which makes it a “Markov random field” (MRF). Specifically, $μ \in Δ T o t (F)$ is called an MRF if there is an undirected graph $G$ whose vertices are $i n d (F)$ and in which every vertex is of finite degree, s.t. $μ$ is an MRF on $G$ in the obvious sense. The property of being an MRF is preserved under pushforwards w.r.t. $P F S$ morphisms.
- Vanessa Kosoy 6 Jun 2022 17:28 UTC
  LW: 3 AF: 3
  AF Parent
  Infra-Bayesian physicalism is an interesting example in favor of the thesis that the more qualitatively capable an agent is, the less corrigible it is. (a.k.a. “corrigibility is anti-natural to consequentialist reasoning”). Specifically, alignment protocols that don’t rely on value learning become vastly less safe when combined with IBP:
  - Example 1: Using steep time discount to disincentivize dangerous long-term plans. For IBP, “steep time discount” just means, predominantly caring about your source code running with particular short inputs. Such a goal strongly incentives the usual convergent instrumental goals: first take over the world, then run your source code with whatever inputs you want. IBP agents just don’t have time discount in the usual sense: a program running late in physical time is just as good as one running early in physical time.
  - Example 2: Debate. This protocol relies on a zero-sum game between two AIs. But, the monotonicity principle rules out the possibility of zero-sum! (If $L$ and $- L$ are both monotonic loss functions then $L$ is a constant). So, in a “debate” between IBP agents, they cooperate to take over the world and then run the source code of each debater with the input “I won the debate”.
  - Example 3: Forecasting/imitation (an IDA in particular). For an IBP agent, the incentivized strategy is: take over the world, then run yourself with inputs showing you making perfect forecasts.
  The conclusion seems to be, it is counterproductive to use IBP to solve the acausal attack problem for most protocols. Instead, you need to do PreDCA or something similar. And, if acausal attack is a serious problem, then approaches that don’t do value learning might be doomed.
  What links here?
  - Vanessa Kosoy's comment on Vanessa Kosoy’s Shortform by Vanessa Kosoy (24 Nov 2021 20:05 UTC; 8 points)
  - Vanessa Kosoy's comment on Vanessa Kosoy’s Shortform by Vanessa Kosoy (6 Aug 2022 5:19 UTC; 3 points)
- Vanessa Kosoy 9 Apr 2022 12:16 UTC
  LW: 3 AF: 2
  AF Parent
  Infradistributions admit an information-theoretic quantity that doesn’t exist in classical theory. Namely, it’s a quantity that measures how many bits of Knightian uncertainty an infradistribution has. We define it as follows:
  
  Let $X$ be a finite set and $Θ$ a crisp infradistribution (credal set) on $X$ , i.e. a closed convex subset of $Δ X$ . Then, imagine someone trying to communicate a message by choosing a distribution out of $Θ$ . Formally, let $Y$ be any other finite set (space of messages), $θ \in Δ Y$ (prior over messages) and $K : Y \to Θ$ (communication protocol). Consider the distribution $η := θ ⋉ K \in Δ (Y \times X)$ . Then, the information capacity of the protocol is the mutual information between the projection on $Y$ and the projection on $X$ according to $η$ , i.e. $I_{η} ({p r}_{X}; {p r}_{Y})$ . The “Knightian entropy” of $Θ$ is now defined to be the maximum of $I_{η} ({p r}_{X}; {p r}_{Y})$ over all choices of $Y$ , $θ$ , $K$ . For example, if $Θ$ is Bayesian then it’s $0$ , whereas if $Θ = ⊤_{X}$ , it is $ln | X |$ .
  
  Here is one application^[1] of this concept, orthogonal to infra-Bayesianism itself. Suppose we model inner alignment by assuming that some portion $ϵ$ of the prior $ζ$ consists of malign hypotheses. And we want to design e.g. a prediction algorithm that will converge to good predictions without allowing the malign hypotheses to attack, using methods like confidence thresholds. Then we can analyze the following metric for how unsafe the algorithm is.
  
  Let $O$ be the set of observations and $A$ the set of actions (which might be “just” predictions) of our AI, and for any environment $τ$ and prior $ξ$ , let $D_{τ}^{ξ} (n) \in Δ (A \times O)^{n}$ be the distribution over histories resulting from our algorithm starting with prior $ξ$ and interacting with environment $τ$ for $n$ time steps. We have $ζ = ϵ μ + (1 - ϵ) β$ , where $μ$ is the malign part of the prior and $β$ the benign part. For any $μ^{'}$ , consider $D_{τ}^{ϵ μ^{'} + (1 - ϵ) β} (n)$ . The closure of the convex hull of these distributions for all choices of $μ^{'}$ (“attacker policy”) is some $Θ_{τ}^{β} (n) \in Δ (A \times O)^{n}$ . The maximal Knightian entropy of $Θ_{τ}^{β} (n)$ over all admissible $τ$ and $β$ is called the malign capacity of the algorithm. Essentially, this is a bound on how much information the malign hypotheses can transmit into the world via the AI during a period of $n$ . The goal then becomes finding algorithms with simultaneously good regret bounds and good (in particular, at most polylogarithmic in $n$ ) malign capacity bounds.
  ↩︎
  This is an idea I’m collaborating on with Johannes Treutlein.
- Vanessa Kosoy 16 Jan 2021 0:01 UTC
  LW: 3 AF: 1
  AF Parent
  Infra-Bayesianism can be naturally understood as semantics for a certain non-classical logic. This promises an elegant synthesis between deductive/symbolic reasoning and inductive/intuitive reasoning, with several possible applications. Specifically, here we will explain how this can work for higher-order logic. There might be holes and/or redundancies in the precise definitions given here, but I’m quite confident the overall idea is sound.
  
  We will work with homogenous ultracontributions (HUCs). $□ X$ will denote the space of HUCs over $X$ . Given $μ \in □ X$ , $S (μ) \subseteq Δ^{c} X$ will denote the corresponding convex set. Given $p \in Δ X$ and $μ \in □ X$ , $p : μ$ will mean $p \in S (μ)$ . Given $μ, ν \in □ X$ , $μ ⪯ ν$ will mean $S (μ) \subseteq S (ν)$ .
  
  Syntax
  
  Let $T^{ι}$ denote a set which we interpret as the types of individuals (we allow more than one). We then recursively define the full set of types $T$ by:
  - $0 \in T$ (intended meaning: the uninhabited type)
  - $1 \in T$ (intended meaning: the one element type)
  - If $α \in T^{ι}$ then $α \in T$
  - If $α, β \in T$ then $α + β \in T$ (intended meaning: disjoint union)
  - If $α, β \in T$ then $α \times β \in T$ (intended meaning: Cartesian product)
  - If $α \in T$ then $(α) \in T$ (intended meaning: predicates with argument of type $α$ )
  For each $α, β \in T$ , there is a set $F_{α \to β}^{0}$ which we interpret as atomic terms of type $α \to β$ . We will denote $V_{α}^{0} := F_{1 \to α}^{0}$ . Among those we distinguish the logical atomic terms:
  - ${p r}_{α β} \in F_{α \times β \to α}^{0}$
  - $i_{α β} \in F_{α \to α + β}^{0}$
  - Symbols we will not list explicitly, that correspond to the algebraic properties of $+$ and $\times$ (commutativity, associativity, distributivity and the neutrality of $0$ and $1$ ). For example, given $α, β \in T$ there is a “commutator” of type $α \times β \to β \times α$ .
  - $=_{α} \in V_{(α \times α)}^{0}$
  - ${d i a g}_{α} \in F_{α \to α \times α}^{0}$
  - $()_{α} \in V_{((α) \times α)}^{0}$ (intended meaning: predicate evaluation)
  - $⊥ \in V_{(1)}^{0}$
  - $⊤ \in V_{(1)}^{0}$
  - $\lor_{α} \in F_{(α) \times (α) \to (α)}^{0}$
  - $\land_{α} \in F_{(α) \times (α) \to (α)}^{0}$ [EDIT: Actually this doesn’t work because, except for finite sets, the resulting mapping (see semantics section) is discontinuous. There are probably ways to fix this.]
  - $\exists_{α β} \in F_{(α \times β) \to (β)}^{0}$
  - $\forall_{α β} \in F_{(α \times β) \to (β)}^{0}$ [EDIT: Actually this doesn’t work because, except for finite sets, the resulting mapping (see semantics section) is discontinuous. There are probably ways to fix this.]
  - Assume that for each $n \in N$ there is some $D_{n} \subseteq □ [n]$ : the set of “describable” ultracontributions [EDIT: it is probably sufficient to only have the fair coin distribution in $D_{2}$ in order for it to be possible to approximate all ultracontributions on finite sets]. If $μ \in D_{n}$ then $┌ μ ┐ \in V_{(\sum_{i = 1}^{n} 1)}$
  We recursively define the set of all terms $F_{α \to β}$ . We denote $V_{α} := F_{1 \to α}$ .
  - If $f \in F_{α \to β}^{0}$ then $f \in F_{α \to β}$
  - If $f_{1} \in F_{α_{1} \to β_{1}}$ and $f_{2} \in F_{α_{2} \to β_{2}}$ then $f_{1} \times f_{2} \in F_{α_{1} \times α_{2} \to β_{1} \times β_{2}}$
  - If $f_{1} \in F_{α_{1} \to β_{1}}$ and $f_{2} \in F_{α_{2} \to β_{2}}$ then $f_{1} + f_{2} \in F_{α_{1} + α_{2} \to β_{1} + β_{2}}$
  - If $f \in F_{α \to β}$ then $f^{- 1} : F_{(β) \to (α)}$
  - If $f \in F_{α \to β}$ and $g \in F_{β \to γ}$ then $g \circ f \in F_{α \to γ}$
  Elements of $V_{(α)}$ are called formulae. Elements of $V_{(1)}$ are called sentences. A subset of $V_{(1)}$ is called a theory.
  
  Semantics
  
  Given $T \subseteq V_{(1)}$ , a model $M$ of $T$ is the following data. To each $α \in T$ , there must correspond some compact Polish space $M (t)$ s.t.:
  - $M (0) = \emptyset$
  - $M (1) = p t$ (the one point space)
  - $M (α + β) = M (α) ⊔ M (β)$
  - $M (α \times β) = M (α) \times M (β)$
  - $M ((α)) = □ M (α)$
  To each $f \in F_{α \to β}$ , there must correspond a continuous mapping $M (f) : M (α) \to M (β)$ , under the following constraints:
  - $p r$ , $i$ , $d i a g$ and the “algebrators” have to correspond to the obvious mappings.
  - $M (=_{α}) = ⊤_{{d i a g}_{M (α)}}$ . Here, ${d i a g}_{X} \subseteq X \times X$ is the diagonal and $⊤_{C} \in □ X$ is the sharp ultradistribution corresponding to the closed set $C \subseteq X$ .
  - Consider $α \in T$ and denote $X := M (α)$ . Then, $M (()_{α}) = ⊤_{□ X} ⋉ {i d}_{□ X}$ . Here, we use the observation that the identity mapping ${i d}_{□ X}$ can be regarded as an infrakernel from $□ X$ to $X$ .
  - $M (⊥) = ⊥_{p t}$
  - $M (⊤) = ⊤_{p t}$
  - $S (M (\lor) (μ, ν))$ is the convex hull of $S (μ) \cup S (ν)$
  - $S (M (\land) (μ, ν))$ is the intersection of $S (μ) \cup S (ν)$
  - Consider $α, β \in T$ and denote $X := M (α)$ , $Y := M (β)$ and $p r : X \times Y \to Y$ the projection mapping. Then, $M (\exists_{α β}) (μ) = {p r}_{*} μ$ .
  - Consider $α, β \in T$ and denote $X := M (α)$ , $Y := M (β)$ and $p r : X \times Y \to Y$ the projection mapping. Then, $p : M (\forall_{α β}) (μ)$ iff for all $q \in Δ^{c} (X \times Y)$ , if ${p r}_{*} q = p$ then $q : μ$ .
  - $M (f_{1} \times f_{2}) = M (f_{1}) \times M (f_{2})$
  - $M (f_{1} + f_{2}) = M (f_{1}) ⊔ M (f_{2})$
  - $M (f^{- 1}) (μ) = M (f)^{*} (μ)$ .
  - $M (g \circ f) = M (g) \circ M (f)$
  - $M (┌ μ ┐) = μ$
  Finally, for each $ϕ \in T$ , we require $M (ϕ) = ⊤_{p t}$ .
  
  Semantic Consequence
  
  Given $ϕ \in V_{(1)}$ , we say $M ⊨ ϕ$ when $M (ϕ) = ⊤_{p t}$ . We say $T ⊨ ϕ$ when for any model $M$ of $T$ , $M ⊨ ϕ$ . It is now interesting to ask what is the computational complexity of deciding $T ⊨ ϕ$ . [EDIT: My current best guess is co-RE]
  
  Applications
  
  As usual, let $A$ be a finite set of actions and $O$ be a finite set of observation. Require that for each $o \in O$ there is $σ_{o} \in T^{ι}$ which we interpret as the type of states producing observation $o$ . Denote $σ_{*} := \sum_{o \in O} σ_{o}$ (the type of all states). Moreover, require that our language has the nonlogical symbols $s_{0} \in V_{(σ_{*})}^{0}$ (the initial state) and, for each $a \in A$ , $K_{a} \in F_{σ_{*} \to (σ_{*})}^{0}$ (the transition kernel). Then, every model defines a (pseudocausal) infra-POMDP. This way we can use symbolic expressions to define infra-Bayesian RL hypotheses. It is then tempting to study the control theoretic and learning theoretic properties of those hypotheses. Moreover, it is natural to introduce a prior which weights those hypotheses by length, analogical to the Solomonoff prior. This leads to some sort of bounded infra-Bayesian algorithmic information theory and bounded infra-Bayesian analogue of AIXI.
  What links here?
  - Vanessa Kosoy 23 Jan 2021 17:05 UTC
    LW: 5 AF: 2
    AF Parent
    Let’s also explicitly describe 0th order and 1st order infra-Bayesian logic (although they are should be segments of higher-order).
    
    0-th order
    
    Syntax
    
    Let $A$ be the set of propositional variables. We define the language $L$ :
    
    Any $a \in A$ is also in $L$
    $⊥ \in L$
    $⊤ \in L$
    Given $ϕ, ψ \in L$ , $ϕ \land ψ \in L$
    Given $ϕ, ψ \in L$ , $ϕ \lor ψ \in L$
    
    Notice there’s no negation or implication. We define the set of judgements $J := L \times L$ . We write judgements as $ϕ ⊢ ψ$ (” $ψ$ in the context of $ϕ$ ”). A theory is a subset of $J$ .
    
    Semantics
    
    Given $T \subseteq J$ , a model of $T$ consists of a compact Polish space $X$ and a mapping $M : L \to □ X$ . The latter is required to satisfy:
    
    $M (⊥) = ⊥_{X}$
    $M (⊤) = ⊤_{X}$
    $M (ϕ \land ψ) = M (ϕ) \land M (ψ)$ . Here, we define $\land$ of infradistributions as intersection of the corresponding sets
    $M (ϕ \lor ψ) = M (ϕ) \lor M (ψ)$ . Here, we define $\lor$ of infradistributions as convex hull of the corresponding sets
    For any $ϕ ⊢ ψ \in T$ , $M (ϕ) ⪯ M (ψ)$
    
    1-st order
    
    Syntax
    
    We define the language using the usual syntax of 1-st order logic, where the allowed operators are $\land$ , $\lor$ and the quantifiers $\forall$ and $\exists$ . Variables are labeled by types from some set $T$ . For simplicity, we assume no constants, but it is easy to introduce them. For any sequence of variables $(v_{1} \dots v_{n})$ , we denote $L_{v}$ the set of formulae whose free variables are a subset of $v_{1} \dots v_{n}$ . We define the set of judgements $J := ⋃_{v} L_{v} \times L_{v}$ .
    
    Semantics
    
    Given $T \subseteq J$ , a model of $T$ consists of
    
    For every $t \in T$ , a compact Polish space $M (t)$
    For every $ϕ \in L_{v}$ where $v_{1} \dots v_{n}$ have types $t_{1} \dots t_{n}$ , an element $M_{v} (ϕ)$ of $□ X_{v}$ , where $X_{v} := (\prod_{i = 1}^{n} M (t_{i}))$
    
    It must satisfy the following:
    
    $M_{v} (⊥) = ⊥_{X_{v}}$
    $M_{v} (⊤) = ⊤_{X_{v}}$
    $M_{v} (ϕ \land ψ) = M_{v} (ϕ) \land M_{v} (ψ)$
    $M_{v} (ϕ \lor ψ) = M_{v} (ϕ) \lor M_{v} (ψ)$
    Consider variables $u_{1} \dots u_{n}$ of types $t_{1} \dots t_{n}$ and variables $v_{1} \dots v_{m}$ of types $s_{1} \dots s_{m}$ . Consider also some $σ : {1 \dots m} \to {1 \dots n}$ s.t. $s_{i} = t_{σ i}$ . Given $ϕ \in L_{v}$ , we can form the substitution $ψ := ϕ [v_{i} = u_{σ (i)}] \in L_{u}$ . We also have a mapping $f_{σ} : X_{u} \to X_{v}$ given by $f_{σ} (x_{1} \dots x_{n}) = (x_{σ (1)} \dots x_{σ (m)})$ . We require $M_{u} (ψ) = f^{*} (M_{v} (ϕ))$
    Consider variables $v_{1} \dots v_{n}$ and $i \in {1 \dots n}$ . Denote $p r : X_{v} \to X_{v ∖ v_{i}}$ the projection mapping. We require $M_{v ∖ v_{i}} (\exists v_{i} : ϕ) = {p r}_{*} (M_{v} (ϕ))$
    Consider variables $v_{1} \dots v_{n}$ and $i \in {1 \dots n}$ . Denote $p r : X_{v} \to X_{v ∖ v_{i}}$ the projection mapping. We require that $p : M_{v ∖ v_{i}} (\forall v_{i} : ϕ)$ if an only if, for all $q \in Δ X_{v}$ s.t ${p r}_{*} q = p$ , $q : {p r}_{*} (M_{v} (ϕ))$
    For any $ϕ ⊢ ψ \in T$ , $M_{v} (ϕ) ⪯ M_{v} (ψ)$
    What links here?
    [Closed] Hiring a mathematician to work on the learning-theoretic AI alignment agenda by Vanessa Kosoy (19 Apr 2022 6:44 UTC; 69 points)
    [Closed] Prize and fast track to alignment research at ALTER by Vanessa Kosoy (17 Sep 2022 16:58 UTC; 63 points)
    [Closed] Hiring a mathematician to work on the learning-theoretic AI alignment agenda by Vanessa (EA Forum; 19 Apr 2022 6:49 UTC; 53 points)
    [Closed] Prize and fast track to alignment research at ALTER by Vanessa (EA Forum; 18 Sep 2022 9:15 UTC; 38 points)
    - Vanessa Kosoy 1 Nov 2021 15:49 UTC
      LW: 3 AF: 1
      AF Parent
      There is a special type of crisp infradistributions that I call “affine infradistributions”: those that, represented as sets, are closed not only under convex linear combinations but also under affine linear combinations. In other words, they are intersections between the space of distributions and some closed affine subspace of the space of signed measures. Conjecture: in 0-th order logic of affine infradistributions, consistency is polynomial-time decidable (whereas for classical logic it is ofc NP-hard).
      
      To produce some evidence for the conjecture, let’s consider a slightly different problem. Specifically, introduce a new semantics in which $□ X$ is replaced by the set of linear subspaces of some finite dimensional vector space $V$ . A model $M$ is required to satisfy:
      
      $M (⊥) = 0$
      $M (⊤) = V$
      $M (ϕ \land ψ) = M (ϕ) \cap M (ψ)$
      $M (ϕ \lor ψ) = M (ϕ) + M (ψ)$
      For any $ϕ ⊢ ψ \in T$ , $M (ϕ) \subseteq M (ψ)$
      
      If you wish, this is “non-unitary quantum logic”. In this setting, I have a candidate polynomial-time algorithm for deciding consistency. First, we transform $T$ into an equivalent theory s.t. all judgments are of the following forms:
      
      $a = ⊥$
      $a = ⊤$
      $a ⊢ b$
      Pairs of the form $c = a \land b$ , $d = a \lor b$ .
      
      Here, $a, b, c, d \in A$ are propositional variables and “ $ϕ = ψ$ ” is a shorthand for the pair of judgments $ϕ ⊢ ψ$ and $ψ ⊢ ϕ$ .
      
      Second, we make sure that our $T$ also satisfies the following “closure” properties:
      
      If $a ⊢ b$ and $b ⊢ c$ are in $T$ then so is $a ⊢ c$
      If $c = a \land b$ is in $T$ then so are $c ⊢ a$ and $c ⊢ b$
      If $c = a \lor b$ is in $T$ then so are $a ⊢ c$ and $b ⊢ c$
      If $c = a \land b$ , $d ⊢ a$ and $d ⊢ b$ are in $T$ then so is $d ⊢ c$
      If $c = a \lor b$ , $a ⊢ d$ and $b ⊢ d$ are in $T$ then so is $c ⊢ d$
      
      Third, we assign to each $a \in A$ a real-valued variable $x_{a}$ . Then we construct a linear program for these variables consisting of the following inequalities:
      
      For any $a \in A$ : $0 \leq x_{a} \leq 1$
      For any $a ⊢ b$ in $T$ : $x_{a} \leq x_{b}$
      For any pair $c = a \land b$ and $d = a \lor b$ in $T$ : $x_{c} + x_{d} = x_{a} + x_{b}$
      For any $a = ⊥$ : $x_{a} = 0$
      For any $a = ⊤$ : $x_{a} = 1$
      
      Conjecture: the theory is consistent if and only if the linear program has a solution. To see why it might be so, notice that for any model $M$ we can construct a solution by setting
      
      $x_{a} := \frac{d i m M (a)}{d i m M (⊤)}$
      
      I don’t have a full proof for the converse but here are some arguments. If a solution exists, then it can be chosen to be rational. We can then rescale it to get integers which are candidate dimensions of our subspaces. Consider the space of all ways to choose subspaces of these dimensions s.t. the constraints coming from judgments of the form $a ⊢ b$ are satisfied. This is a moduli space of poset representations. It is easy to see it’s non-empty (just let the subspaces be spans of vectors taken from a fixed basis). By Proposition A.2 in Futorny and Iusenko it is an irreducible algebraic variety. Therefore, to show that we can also satisfy the remaining constraints, it is enough to check that (i) the remaining constraints are open (ii) each of the remaining constraints (considered separately) holds at some point of the variety. The first is highly likely and the second is at least plausible.
      
      The algorithm also seems to have a natural extension to the original infra-Bayesian setting.
      What links here?
      What are the coolest topics in AI safety, to a hopelessly pure mathematician? by Jenny K E (EA Forum; 7 May 2022 7:18 UTC; 89 points)
      Non-Unitary Quantum Logic—SERI MATS Research Sprint by Yegreg (16 Feb 2023 19:31 UTC; 27 points)
      Vanessa Kosoy's comment on [Closed] Job Offering: Help Communicate Infrabayesianism by abramdemski (26 Mar 2022 7:49 UTC; 15 points)
  - Vanessa Kosoy 16 Jan 2021 12:03 UTC
    LW: 2 AF: 1
    AF Parent
    When using infra-Bayesian logic to define a simplicity prior, it is natural to use “axiom circuits” rather than plain formulae. That is, when we write the axioms defining our hypothesis, we are allowed to introduce “shorthand” symbols for repeating terms. This doesn’t affect the expressiveness, but it does affect the description length. Indeed, eliminating all the shorthand symbols can increase the length exponentially.
  - Vanessa Kosoy 16 Jan 2021 11:54 UTC
    LW: 2 AF: 1
    AF Parent
    Instead of introducing all the “algebrator” logical symbols, we can define $T$ as the quotient by the equivalence relation defined by the algebraic laws. We then need only two extra logical atomic terms:
    
    For any $n \in N$ and $σ \in S_{n}$ (permutation), denote $n := \sum_{i = 1}^{n} 1$ and require $σ^{+} \in F_{n \to n}$
    For any $n \in N$ and $σ \in S_{n}$ , $σ_{α}^{\times} \in F_{α^{n} \to α^{n}}$
    
    However, if we do this then it’s not clear whether deciding that an expression is a well-formed term can be done in polynomial time. Because, to check that the types match, we need to test the identity of algebraic expressions and opening all parentheses might result in something exponentially long.
    - Vanessa Kosoy 24 Jan 2021 18:10 UTC
      LW: 2 AF: 1
      AF Parent
      Actually the Schwartz–Zippel algorithm can easily be adapted to this case (just imagine that types are variables over $Q$ , and start from testing the identity of the types appearing inside parentheses), so we can validate expressions in randomized polynomial time (and, given standard conjectures, in deterministic polynomial time as well).
- Vanessa Kosoy 24 Jul 2023 9:56 UTC
  LW: 2 AF: 2
  AF Parent
  Master post for ideas about metacognitive agents.
  - Vanessa Kosoy 21 Apr 2024 14:43 UTC
    LW: 3 AF: 2
    0
    AF Parent
    Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to “plain” IBRL regret bounds when we consider the core and the envelope as the “inside” of the agent.
    Assume that the action and observation sets factor as $A = A_{0} \times A_{1}$ and $O = O_{0} \times O_{1}$ , where $(A_{0}, O_{0})$ is the interface with the external environment and $(A_{1}, O_{1})$ is the interface with the envelope.
    Let $Λ : Π \to □ (Γ \times (A \times O)^{ω})$ be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:
    Marginalizing over $Γ$ . That is, let ${p r}_{- Γ} : Γ \times (A \times O)^{ω} \to (A \times O)^{ω}$ and ${p r}_{0} : (A \times O)^{ω} \to (A_{0} \times O_{0})^{ω}$ be the projections. Then, we have the law $Λ^{?} := ({p r}_{0} {p r}_{- Γ})_{*} \circ Λ$ .
    Assuming “logical omniscience”. That is, let $τ^{*} \in Γ$ be the ground truth. Then, we have the law $Λ^{!} := {p r}_{0 *} (Λ ∣ τ^{*})$ . Here, we use the conditional defined by $Θ ∣ A := {θ ∣ A ∣ θ \in arg {max}_{Θ} Pr [A]}$ . It’s easy to see this indeed defines a law.
    However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t $Λ$ :
    Learning $Λ^{?}$ is typically no less feasible than learning $Λ$ , however it is a much weaker condition. This is because the metacognitive agents can use policies that query the envelope to get higher guaranteed expected utility.
    Learning $Λ^{!}$ is a much stronger condition than learning $Λ$ , however it is typically infeasible. Requiring it leads to AIXI-like agents.
    Therefore, metacognitive regret bounds hit a “sweep spot” of stength vs. feasibility which produces a genuinely more powerful agents than IBRL^[1].
    ^
    More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there’s a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.
  - Vanessa Kosoy 25 Mar 2024 1:27 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Formalizing the richness of mathematics
    Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.
    Here is my proposal for how to formulate a theorem that would make this idea rigorous.
    (Wrong) First Attempt
    Fix some natural hypothesis class for mathematical knowledge, such as some variety of tree automata. Each such hypothesis $Θ$ represents an infradistribution over $Γ$ : the “space of counterpossible computational universes”. We can say that $Θ$ is a “true hypothesis” when there is some $θ$ in the credal set $Θ$ (a distribution over $Γ$ ) s.t. the ground truth $Υ^{*} \in Γ$ “looks” as if it’s sampled from $θ$ . The latter should be formalizable via something like a computationally bounded version of Marin-Lof randomness.
    We can now try to say that $Υ^{*}$ is “rich” if for any true hypothesis $Θ$ , there is a refinement $Ξ \subseteq Θ$ which is also a true hypothesis and “knows” at least one bit of information that $Θ$ doesn’t, in some sense. This is clearly true, since there can be no automaton or even any computable hypothesis which fully describes $Υ^{*}$ . But, it’s also completely boring: the required $Ξ$ can be constructed by “hardcoding” an additional fact into $Θ$ . This doesn’t look like “discovering interesting structure”, but rather just like brute-force memorization.
    (Wrong) Second Attempt
    What if instead we require that $Ξ$ knows infinitely many bits of information that $Θ$ doesn’t? This is already more interesting. Imagine that instead of metacognition / mathematics, we would be talking about ordinary sequence prediction. In this case it is indeed an interesting non-trivial condition that the sequence contains infinitely many regularities, s.t. each of them can be expressed by a finite automaton but their conjunction cannot. For example, maybe the $n$ -th bit in the sequence depends only the largest $k$ s.t. $2^{k}$ divides $n$ , but the dependence on $k$ is already uncomputable (or at least inexpressible by a finite automaton).
    However, for our original application, this is entirely insufficient. This is because in the formal language we use to define $Γ$ (e.g. combinator calculus) has some “easy” equivalence relations. For example, consider the family of programs of the form “if 2+2=4 then output 0, otherwise...”. All of those programs would output 0, which is obvious once you know that 2+2=4. Therefore, once your automaton is able to check some such easy equivalence relations, hardcoding a single new fact (in the example, 2+2=4) generates infinitely many “new” bits of information. Once again, we are left with brute-force memorization.
    (Less Wrong) Third Attempt
    Here’s the improved condition: For any true hypothesis $Θ$ , there is a true refinement $Ξ \subseteq Θ$ s.t. conditioning $Θ$ on any finite set of observations cannot produce a refinement of $Ξ$ .
    There is a technicality here, because we’re talking about infradistributions, so what is “conditioning” exactly? For credal sets, I think it is sufficient to allow two types of “conditioning”:
    For any given observation $A$ and $p \in (0, 1]$ , we can form ${θ \in Θ ∣ θ (A) \geq p}$ .
    For any given observation $A$ s.t. ${min}_{θ \in Θ} θ (A) > 0$ , we can form ${(θ ∣ A) ∣ θ \in Θ}$ .
    This rules-out the counterexample from before: the easy equivalence relation can be represented inside $Θ$ , and then the entire sequence of “novel” bits can be generated by a conditioning.
    Alright, so does $Υ^{*}$ actually satisfy this condition? I think it’s very probable, but I haven’t proved it yet.
  - Vanessa Kosoy 4 Aug 2023 5:07 UTC
    LW: 2 AF: 2
    AF Parent
    Recording of a talk I gave in VAISU 2023.
    What links here?
    Critical review of Christiano’s disagreements with Yudkowsky by Vanessa Kosoy (27 Dec 2023 16:02 UTC; 172 points)
    Learning-theoretic agenda reading list by Vanessa Kosoy (9 Nov 2023 17:25 UTC; 91 points)
    Vanessa Kosoy's comment on Meta Questions about Metaphilosophy by Wei Dai (1 Sep 2023 5:32 UTC; 9 points)
  - Vanessa Kosoy 24 Jul 2023 11:07 UTC
    LW: 2 AF: 2
    AF Parent
    Here is the sketch of a simplified model for how a metacognitive agent deals with traps.
    Consider some (unlearnable) prior $ζ$ over environments, s.t. we can efficiently compute the distribution $ζ (h)$ over observations given any history $h$ . For example, any prior over a small set of MDP hypotheses would qualify. Now, for each $h$ , we regard $ζ (h)$ as a “program” that the agent can execute and form beliefs about. In particular, we have a “metaprior” $ξ$ consisting of metahypotheses: hypotheses-about-programs.
    For example, if we let every metahypothesis be a small infra-RDP satisfying appropriate assumptions, we probably have an efficient “metalearning” algorithm. More generally, we can allow a metahypothesis to be a learnable mixture of infra-RDPs: for instance, there is a finite state machine for specifying “safe” actions, and the infra-RDPs in the mixture guarantee no long-term loss upon taking safe actions.
    In this setting, there are two levels of learning algorithms:
    The metalearning algorithm, which learns the correct infra-RDP mixture. The flavor of this algorithm is RL in a setting where we have a simulator of the environment (since we can evaluate $ζ (h)$ for any $h$ ). In particular, here we don’t worry about exploitation/exploration tradeoffs.
    The “metacontrol” algorithm, which given an infra-RDP mixture, approximates the optimal policy. The flavor of this algorithm is “standard” RL with exploitation/exploration tradeoffs.
    In the simplest toy model, we can imagine that metalearning happens entirely in advance of actual interaction with the environment. More realistically, the two needs to happen in parallel. It is then natural to apply metalearning to the current environmental posterior rather than the prior (i.e. the histories starting from the history that already occurred). Such an agent satisfies “opportunistic” guarantees: if at any point of time, the posterior admits a useful metahypothesis, the agent can exploit this metahypothesis. Thus, we address both parts of the problem of traps:
    The complexity-theoretic part (subproblem 1.2) is addressed by approximating the intractable Bayes-optimality problem by the metacontrol problem of the (coarser) metahypothesis.
    The statistical part (subproblem 2.1) is addressed by opportunism: if at some point, we can easily learn something about the physical environment, then we do.
- Vanessa Kosoy 20 May 2023 15:17 UTC
  LW: 2 AF: 2
  AF Parent
  Jobst Heitzig asked me whether infra-Bayesianism has something to say about the absent-minded driver (AMD) problem. Good question! Here is what I wrote in response:
  Philosophically, I believe that it is only meaningful to talk about a decision problem when there is also some mechanism for learning the rules of the decision problem. In ordinary Newcombian problems, you can achieve this by e.g. making the problem iterated. In AMD, iteration doesn’t really help because the driver doesn’t remember anything that happened before. We can consider a version of iterated AMD where the driver has a probability $0 < ϵ ≪ 1$ to remember every intersection, but they always remember whether they arrived at the right destination. Then, it is equivalent to the following Newcombian problem:
  With probability $1 - 2 ϵ$ , counterfactual A happens, in which Omega decides about both intersections via simulating the driver in counterfactuals B and C.
  With probability $ϵ$ , counterfactual B happens, in which the driver decides about the first intersection, and Omega decides about the second intersection via simulating the driver in counterfactual C.
  With probability $ϵ$ , counterfactual C happens, in which the driver decides about the second intersection, and Omega decides about the first intersection via simulating the driver in counterfactual B.
  For this, an IB agent indeed learns the updateless optimal policy (although the learning rate carries an $ϵ^{- 1}$ penalty).
- Vanessa Kosoy 3 Feb 2023 18:05 UTC
  LW: 2 AF: 2
  AF Parent
  The following was written by me during the “Finding the Right Abstractions for healthy systems” research workshop, hosted by Topos Institute in January 2023. However, I invented the idea before.
  
  Here’s an elegant diagrammatic notation for constructing new infrakernels out of given infrakernels. There is probably some natural category-theoretic way to think about it, but at present I don’t know what it is.
  
  By “infrakernel” we will mean a continuous mapping of the form $X \to □ Y$ , where $X$ and $Y$ are compact Polish spaces and $□ Y$ is the space of credal sets (i.e. closed convex sets of probability distributions) over $Y$ .
  
  Syntax
  - The diagram consists of child vertices, parent vertices, squiggly lines, arrows, dashed arrows and slashes.
  - There can be solid arrows incoming into the diagram. Each such arrow $a$ is labeled by a compact Polish space $D (a)$ and ends on a parent vertex $t (a)$ . And, $s (a) = ⊥$ (i.e. the arrow has no source vertex).
  - There can be dashed and solid arrows between vertices. Each such arrow $a$ starts from a child vertex $s (a)$ and ends on a parent vertex $t (a)$ . We require that $P (s (a)) \neq t (a)$ (i.e. they should not be also connected by a squiggly line).
  - There are two types of vertices: parent vertices (denoted by a letter) and child vertices (denoted by a letter or number in a circle).
    Each child vertex $v$ is labeled by a compact Polish space $D (v)$ and connected (by a squiggly line) to a unique parent vertex $P (v)$ . It may or may not be crossed-out by a slash.
    Each parent vertex $p$ is labeled by an infrakernel $K_{p}$ with source $S_{1} \times \dots \times S_{k}$ and target $T_{1} \times \dots \times T_{l}$ where each $S_{i}$ is corresponds to a solid arrow $a$ with $t (a) = p$ and each $T_{j}$ is $D (v)$ for some child vertex $v$ with $P (v) = p$ . We can also add squares with numbers where solid arrows end to keep track of the correspondence between the arguments of $K_{p}$ and the arrows.
    If $s (a) = ⊥$ then the corresponding $S_{i}$ is $D (a)$ .
    If $s (a) = v \neq ⊥$ then the corresponding $S_{i}$ is $D (v)$ .
  Semantics
  - Every diagram $D$ represents an infrakernel $K_{D}$ .
    The source space of $K_{D}$ is a product $X_{1} \times \dots \times X_{n}$ , where each $X_{i}$ is $D (a$ ) for some solid arrow $a$ with $s (a) = ⊥$ .
    The target space of $K_{D}$ is a product $Y_{1} \times \dots \times Y_{m}$ , where each $Y_{j}$ is $D (v)$ for some non-crossed-out child vertex.
  - The value of the $K_{D}$ at a given point $x$ is defined as follows. Let $~ Y := \prod_{v} D (v)$ (a product that includes the cross-out vertices). Then, $K_{D} (x)$ is the set of all the marginal distributions of distributions $μ \in Δ ~ Y$ satisfying the following condition. Consider any parent vertex $p$ . Let $a_{1}, a_{2} \dots a_{k}$ be the (dashed or solid) arrows s.t. $s (a_{i}) \neq ⊥$ and $t (a_{i}) = p$ . For each $i$ s.t., choose any $y_{i} \in D (s (a_{i}))$ . We require that $K_{p} (x, y)$ contains the marginal distribution of $μ ∣ y$ . Here, the notation $K_{p} (x, y)$ means we are using the components of $x$ and $y$ corresponding to solid arrows $a$ with $t (a) = p$ .
- Vanessa Kosoy 10 Nov 2021 18:32 UTC
  LW: 2 AF: 1
  AF Parent
  Two deterministic toy models for regret bounds of infra-Bayesian bandits. The lesson seems to be that equalities are much easier to learn than inequalities.
  
  Model 1: Let $A$ be the space of arms, $O$ the space of outcomes, $r : A \times O \to R$ the reward function, $X$ and $Y$ vector spaces, $H \subseteq X$ the hypothesis space and $F : A \times O \times H \to Y$ a function s.t. for any fixed $a \in A$ and $o \in O$ , $F (a, o) : H \to Y$ extends to some linear operator $T_{a, o} : X \to Y$ . The semantics of hypothesis $h \in H$ is defined by the equation $F (a, o, h) = 0$ (i.e. an outcome $o$ of action $a$ is consistent with hypothesis $h$ iff this equation holds).
  
  For any $h \in H$ denote by $V (h)$ the reward promised by $h$ :
  
  $V (h) := max a \in A min o \in O : F (a, o, h) = 0 r (a, o)$
  
  Then, there is an algorithm with mistake bound $dim X$ , as follows. On round $n \in N$ , let $G_{n} \subseteq H$ be the set of unfalsified hypotheses. Choose $h_{n} \in S$ optimistically, i.e.
  
  $h_{n} := arg max h \in G_{n} V (h)$
  
  Choose the arm $a_{n}$ recommended by hypothesis $h_{n}$ . Let $o_{n} \in O$ be the outcome we observed, $r_{n} := r (a_{n}, o_{n})$ the reward we received and $h^{*} \in H$ the (unknown) true hypothesis.
  
  If $r_{n} \geq V (h_{n})$ then also $r_{n} \geq V (h^{*})$ (since $h^{*} \in G_{n}$ and hence $V (h^{*}) \leq V (h_{n})$ ) and therefore $a_{n}$ wasn’t a mistake.
  
  If $r_{n} < V (h_{n})$ then $F (a_{n}, o_{n}, h_{n}) \neq 0$ (if we had $F (a_{n}, o_{n}, h_{n}) = 0$ then the minimization in the definition of $V (h_{n})$ would include $r (a_{n}, o_{n})$ ). Hence, $h_{n} \notin G_{n + 1} = G_{n} \cap ker T_{a_{n}, o_{n}}$ . This implies $dim s p a n (G_{n + 1}) < dim s p a n (G_{n})$ . Obviously this can happen at most $dim X$ times.
  
  Model 2: Let the spaces of arms and hypotheses be
  
  $A := H := S^{d} := {x \in R^{d + 1} ∣ ∥ x ∥ = 1}$
  
  Let the reward $r \in R$ be the only observable outcome, and the semantics of hypothesis $h \in S^{d}$ be $r \geq h \cdot a$ . Then, the sample complexity cannot be bound by a polynomial of degree that doesn’t depend on $d$ . This is because Murphy can choose the strategy of producing reward $1 - ϵ$ whenever $h \cdot a \leq 1 - ϵ$ . In this case, whatever arm you sample, in each round you can only exclude ball of radius $\approx \sqrt{2 ϵ}$ around the sampled arm. The number of such balls that fit into the unit sphere is $Ω (ϵ^{- \frac{1}{2} d})$ . So, normalized regret below $ϵ$ cannot be guaranteed in less than that many rounds.
  What links here?
  - Vanessa Kosoy's comment on [Closed] Job Offering: Help Communicate Infrabayesianism by abramdemski (26 Mar 2022 7:49 UTC; 15 points)
- Vanessa Kosoy 5 Nov 2021 10:24 UTC
  LW: 2 AF: 1
  AF Parent
  One of the postulates of infra-Bayesianism is the maximin decision rule. Given a crisp infradistribution $Θ$ , it defines the optimal action to be:
  
  $a^{*} (Θ) := arg max a min μ \in Θ E_{μ} [U (a)]$
  
  Here $U$ is the utility function.
  
  What if we use a different decision rule? Let $t \in [0, 1]$ and consider the decision rule
  
  $a_{t}^{*} (Θ) := arg max a (t min μ \in Θ E_{μ} [U (a)] + (1 - t) max μ \in Θ E_{μ} [U (a)])$
  
  For $t = 1$ we get the usual maximin (“pessimism”), for $t = 0$ we get maximax (“optimism”) and for other values of $t$ we get something in the middle (we can call “ $t$ -mism”).
  
  It turns out that, in some sense, this new decision rule is actually reducible to ordinary maximin! Indeed, set
  
  $μ_{t}^{*} := arg max μ E_{μ} [U (a_{t}^{*})]$
  
  $Θ_{t} := t Θ + (1 - t) μ_{t}^{*}$
  
  Then we get
  
  $a^{*} (Θ_{t}) = a_{t}^{*} (Θ)$
  
  More precisely, any pessimistically optimal action for $Θ_{t}$ is $t$ -mistically optimal for $Θ$ (the converse need not be true in general, thanks to the arbitrary choice involved in $μ_{t}^{*}$ ).
  
  To first approximation it means we don’t need to consider $t$ -mistic agents since they are just special cases of “pessimistic” agents. To second approximation, we need to look at what the transformation of $Θ$ to $Θ_{t}$ does to the prior. If we start with a simplicity prior then the result is still a simplicity prior. If $U$ has low description complexity and $t$ is not too small then essentially we get full equivalence between “pessimism” and $t$ -mism. If $t$ is small then we get a strictly “narrower” prior (for $t = 0$ we are back at ordinary Bayesianism). However, if $U$ has high description complexity then we get a rather biased simplicity prior. Maybe the latter sort of prior is worth considering.
  What links here?

Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Formalizing the richness of mathematics

(Wrong) First Attempt

(Wrong) Second Attempt

(Less Wrong) Third Attempt

Syntax

Semantics