Koen.Holtman comments on Ngo and Yudkowsky on alignment difficulty

Koen.Holtman 18 Nov 2021 18:20 UTC
LW: 16 AF: 5
AF

I think it makes complete sense to say something like “once we have enough capability to run AIs making good real-world plans, some moron will run such an AI unsafely”. And that itself implies a startling level of danger. But Eliezer seems to be making a stronger point, that there’s no easy way to run such an AI safely, and all tricks like “ask the AI for plans that succeed conditional on them being executed” fail.

Yes, I am reading here too that Eliezer seems to be making a stronger point, specifically one related to corrigibility.

Looks like Eliezer believes that (or in Bayesian terms, assigns a high probability to the belief that) corrigibility has not been solved for AGI. He believes it has not been solved for any practically useful value of solved. Furthermore it looks like he expects that progress on solving AGI corrigibility will be slower than progress on creating potentially world-ending AGI. If Eliezer believed that AGI corrigibility had been solved or was close to being solved, I expect he would be in a less dark place than depicted, that he would not be predicting that stolen/leaked AGI code will inevitably doom us when some moron turns it up to 11.

In the transcript above, Eliezer devotes significant space to explaining why he believes that all corrigibility solutions being contemplated now will likely not work. Some choice quotations from the end of the transcript:

[...] corrigibility is anticonvergent / anticoherent / actually moderately strongly contrary to and not just an orthogonal property of a powerful-plan generator.

this is where things get somewhat personal for me:

[...] (And yes, people outside MIRI now and then publish papers saying they totally just solved this problem, but all of those “solutions” are things we considered and dismissed as trivially failing to scale to powerful agents—they didn’t understand what we considered to be the first-order problems in the first place—rather than these being evidence that MIRI just didn’t have smart-enough people at the workshop.)

I am one of `these people outside MIRI’ who have published papers and sequences saying that they have solved large chunks of the AGI corrigibility problem.

I have never been claiming that I ‘totally just solved corrigibility’. I am not sure where Eliezer is finding these ‘totally solved’ people, so I will just ignore that bit and treat it as a rhetorical flourish. But I have indeed been claiming that significant progress has been made on AGI corrigibility in the last few years. In particular, especially in the sequence, I implicitly claim that viewpoints have been developed, outside of MIRI, that address and resolve some of MIRIs main concerns about corrigibility. They resolve these in part by moving beyond Eliezer’s impoverished view of what an AGI-level intelligence is, or must be.

Historical note: around 2019 I spent some time trying to get Eliezier/MIRI interested in updating their viewpoints on how easy or hard corrigibility was. They showed no interest to engage at that time, I have since stopped trying. I do not expect that anything I will say here will update Eliezer, my main motivation to write here is to inform and update others.

I will now point out a probable point of agreement between Eliezer and me. Eliezer says above that corrigibility is a property that is contradictory to having a powerful coherent AGI-level plan generator. Here, coherency has something to do with satisfying a bunch of theorems about how a game-theoretically rational utility maximiser must behave when making plans. One of these theorems is that coherence implies an emergent drive towards self-preservation.

I generally agree with Eliezer that there is a indeed a contradiction here: there is a contradiction between broadly held ideas of what it implies for an AGI to be a coherent utility maximising planner, and broadly held ideas of what it implies for an AGI to be corrigible.

I very much disagree with Eliezier on how hard it is to resolve these contradictions. These contradictions about corrigibility are easy to resolve one you abandon the idea that every AGI must necessarily satisfy various theorems about coherency. Human intelligence definitely does not satisfy various theorems about coherency. Almost all currently implemented AI systems do not satisfy some theorems about coherency, because they will not resist you pressing their off switch.

So this is why I call Eliezer’s view of AGI an impoverished view: Eliezer (at least in the discussion transcript above, and generally whenever I read his stuff) always takes it as axiomatic that an AGI must satisfy certain coherence theorems. Once you take that as axiomatic, it is indeed easy to develop some rather negative opinions about how good other people’s solutions to corrigibility are. Any claimed solution can easily be shown to violate at least one axiom you hold dear. You don’t even need to examine the details of the proposed solution to draw that conclusion.
- Eliezer Yudkowsky 18 Nov 2021 23:27 UTC
  LW: 37 AF: 16
  AF Parent
  Various previous proposals for utility indifference have foundered on gotchas like “Well, if we set it up this way, that’s actually just equivalent to the AI assigning probability 0 to the shutdown button ever being pressed, which means that it’ll tend to design the useless button out of itself.” Or, “This AI behaves like the shutdown button gets pressed with a fixed nonzero probability, which means that if, say, that fixed probability is 10%, the AI has an incentive to strongly precommit to making the shutdown button get pressed in cases where the universe doesn’t allow perpetual motion, because that way there’s a nearly 90% probability of perpetual motion being possible.” This tends to be the kind of gotcha you run into, if you try to violate coherence principles; though of course the real and deeper problem is that I expect things contrary to the core of general intelligence to fail to generalize when we try to scale AGI from the safe domains in which feedback can be safely provided, to the unsafe domains in which bad outputs kill the operators before they can label the results.
  It’s all very well and good to say “It’s easy to build an AI that believes 2 + 2 = 5 once you relax the coherence constraints of arithmetic!” But the whole central problem is that we have to train an AI when it’s operating in an intrinsically safe domain and intrinsically safe intelligence level where it couldn’t kill the operators if it tried, and then scale that AI to produce outputs in dangerous domains like “Please build a nanosystem”; and if you build a dumb AI that thinks 2 + 2 = 5, and then make it much more intelligent, I strongly suspect that it snaps into in some sense ‘knowing’ or ‘realizing’ or ‘starting to act in important ways as if’ 2 + 2 = 4.
  Can you boil down your attempted solution to a statement of which coherence constraint you think it is possible to violate, and what shutdownable behavior you think can result from that, how, in the least fancy actually-precise terms of which your writing talent permits? And then I can say whether my answer is “That doesn’t do what you think” or “Sorry, that coherence violation is a large enough ask that I think any trained patch for it probably fails to scale with general intelligence” or “That does seem non-self-destructive in a certain generalized sense, and the shutdownability might work and scale if we could magically conjure the underlying pattern you posit, though I don’t see how you could get it into Mu Zero per se by being clever with loss functions.” Where the third option there is the one that crosses over into “Well golly that was a large advance compared to the state we occupied in 2015.”
  What links here?
  - Koen.Holtman's comment on On Solving Problems Before They Appear: The Weird Epistemologies of Alignment by adamShimi (11 Dec 2021 17:02 UTC; 1 point)
  - Koen.Holtman 19 Nov 2021 20:38 UTC
    LW: 7 AF: 2
    AF Parent
    Can you boil down your attempted solution to a statement of which coherence constraint you think it is possible to violate, and what shutdownable behavior you think can result from that, how,
    
    Glad you asked.
    
    in the least fancy actually-precise terms of which your writing talent permits?
    
    If you want actual full precision, I have to refer you to the math in my papers. Since 2019 I have been working on and off to make this math more general and accessible, and to find better ways to explain this math in actually-precise natural language statements. So here is my best current attempt.
    
    TL;DR: The coherence constraint/theorem/property that I want to violate is the property that the AGI is using a world model that accurately depicts the internals of its own compute core. I want to make the agent use an inaccurate model of these internals, one that omits the safety shut-down mechanisms I have built inside of its core. By doing this, I can reduce the emergent incentive of the AGI agent to disable its own emergency shut-down mechanisms, and the emergent incentive to stop people from activating them.
    
    I will now expand on this and add more details, using the the example of an emergency stop button.
    
    Say that the compute core has an emergency stop button attached to it. Say that actual software running inside the compute core will, when receiving a stop signal from the button, cause the agent to stop. When the signal is received, the software will always select and perform $n u l l$ actions in every future time step. Let’s say that $M_{c}$ is a world model that accurately depicts this situation. I am not going to build an AGI that uses $M_{c}$ to plan its actions.
    
    Instead I build an AGI agent that will plan its next actions by using an incorrect world model $M_{i}$ . This $M_{i}$ is different from $M_{c}$ , but only in how it depicts the internals of the agent compute core. In the incorrect/imaginary world depicted by $M_{i}$ , the compute core has different software in it, software that will ignore the stop button signal, and just keep on picking actions that maximize utility. I further construct my AGI so that, in every time step, it calculates which next action $a$ would maximize utility in this incorrect, imaginary world $M_{i}$ . I then further construct it to take this same action $a$ in the real world.
    
    An AGI that lives in a world that matches the correct model $M_{c}$ , while using the incorrect model $M_{i}$ to decide what actions to take, will (sometimes) act incoherently. But this is a very specific and targeted form of incoherence, far less severe and far less dangerous than having the agent believe that $2 + 2 = 5$ . Still, this incoherency does have some unwanted side effects. One example side effect, also considered in the 2015 MIRI corrigibility paper, is that the AGI will be indifferent about the fate of its emergency stop button. In my papers, I explore a much broader range of side effects, and look at how to prevent/suppress/fix them.
    
    So the first step in the above corrigibility design is that you have to imagine constructing an AGI that will use a specifically incorrect world model $M_{i}$ . The first step is that you have to realize that you actually want this, that you want to decrease coherence in order to increase corrigibility.
    
    The next step is to look for options to actually construct an AGI that uses this incorrect $M_{i}$ .
    
    One option is to train the AGI in a simulated environment that faithfully represents $M_{i}$ . We then extract the learned world model, and put it in an AI that we unleash in the real world. We disable further learning in this AI, to prevent $M_{i}$ from being updated towards $M_{c}$ . Though this option depicts exactly how many current real-life AI systems are constructed and get their corrigibility, this option is not very satisfactory in a theoretical sense. It may not be a viable path to build certain otherwise desirable types of AI/AGI/superintelligence.
    
    The more interesting option is to build an AGI that will construct $M_{i}$ based on an online training regime in the real world, in a real world that exactly matches the correct world model $M_{c}$ .
    
    Is the second option technically feasible? The answer is generally yes.
    
    A very general solution, leveraging causal models, is in this sequence. To give you some idea of the more detailed math involved: the (ls) pictured below is the real/correct world model $M_{c}$ , and (p) is the imaginary/incorrect world model $M_{i}$ . In the agent defined using these pictures, certain indifference properties are present trivially, by construction.
    
    The solution in the pictures above uses ML to learn a model $L$ that correctly models only a part of the real world, the part of the real world outside of the agent compute core. We then slot $L$ into a causal world model, one that has other parts which mis-represent the real compute core internals. This causal world model will then be used as the incorrect $M_{i}$ . Note that this learned $L$ may be a fully opaque deep neural net, or whatever black-box thing the next generation of ML might produce. We do not have to understand or edit any details inside $L$ for this slot-in operation to work.
    
    (I should mention that the paper has a lot of details not mentioned in the sequence, or visible in the pictures above. In particular, section 10.2 may be of interest.)
    
    I want to stress that this causal model option is only one possible route to creating incorrect world models $M_{i}$ via machine learning in the real world. Papers like Safely interruptible agents and How RL Agents Behave When Their Actions Are Modified show that the idea of removing certain events from the training record can also work: whether this works as intended depends on having the right built-in priors, priors which control inductive generalization.
    
    So overall, I have a degree of optimism about AGI corrigibility.
    
    That being said, if you want to map out and estimate probabilities for our possible routes to doom, then you definitely need to include the scenario where a future superior-to-everything-else type of ML is invented, where this superior future type of ML just happens to be incompatible with any of the corrigibility techniques known at that time. Based on the above work, I put a fairly low probability on that scenario.
    - TurnTrout 20 Nov 2021 0:37 UTC
      LW: 15 AF: 11
      AF Parent
      Apparently no one has actually shown that corrigibility can be VNM-incoherent in any precise sense (and not in the hand-wavy sense which is good for intuition-pumping). I went ahead and sketched out a simple proof of how a reasonable kind of corrigibility gives rise to formal VNM incoherence.
      I’m interested in hearing about how your approach handles this environment, because I think I’m getting lost in informal assumptions and symbol-grounding issues when reading about your proposed method.
      - Koen.Holtman 21 Nov 2021 14:51 UTC
        LW: 10 AF: 6
        AF Parent
        Read your post, here are my initial impressions on how it relates to the discussion here.
        
        In your post, you aim to develop a crisp mathematical definition of (in)coherence, i.e. VNM-incoherence. I like that, looks like a good way to move forward. Definitely, developing the math further has been my own approach to de-confusing certain intuitive notions about what should be possible or not with corrigibility.
        
        However, my first impression is that your concept of VNM-incoherence is only weakly related to the meaning that Eliezer has in mind when he uses the term incoherence. In my view, the four axioms of VNM-rationality have only a very weak descriptive and constraining power when it comes to defining rational behavior. I believe that Eliezer’s notion of rationality, and therefore his notion of coherence above, goes far beyond that implied by the axioms of VNM-rationality. My feeling is that Eliezer is using the term ‘coherence constraints’ an intuition-pump way where coherence implies, or almost always implies, that a coherent agent will develop the incentive to self-preserve.
        
        Looking at your post, I am also having trouble telling exactly how you are defining VNM-incoherence. You seem to be toying with several alternative definitions, one where it applies to reward functions (or preferences over lotteries) which are only allowed to examine the final state in a 10-step trajectory, another where the reward function can examine the entire trajectory and maybe the actions taken to produce that trajectory. I think that your proof only works in the first case, but fails in the second case. This has certain (fairly trivial) corollaries about building corrigibility. I’ll expand on this in a comment I plan to attach to your post.
        
        I’m interested in hearing about how your approach handles this environment,
        
        I think one way to connect your ABC toy environment to my approach is to look at sections 3 and 4 of my earlier paper where I develop a somewhat similar clarifying toy environment, with running code.
        
        Another comment I can make is that your ABC nodes-and-arrows state transition diagram is a depiction which makes it hard see how to apply my approach, because the depiction mashes up the state of the world outside of the compute core and the state of the world inside the compute core. If you want to apply counterfactual planning, or if you want to have a an agent design that can compute the balancing function terms according to Armstrong’s indifference approach, you need a different depiction of your setup. You need one which separates out these two state components more explicitely. For example, make an MDP model where the individual states are instances of the tuple (physical position of agent in the ABC playing field,policy function loaded into the compute core).
        
        Not sure how to interpret your statement that you got lost in symbol-grounding issues. If you can expand on this, I might be able to help.
        What links here?
        Koen.Holtman's comment on A Certain Formalization of Corrigibility Is VNM-Incoherent by TurnTrout (21 Nov 2021 18:14 UTC; 1 point)
      - Koen.Holtman 24 Nov 2021 10:32 UTC
        2 points
        AF Parent
        Update: I just recalled that Eliezer and MIRI often talk about Dutch booking when they talk about coherence. So not being susceptible to Dutch booking may be the type of coherence Eliezer has in mind here.
        
        When it comes to Dutch booking as a coherence criterion, I need to repeat again the observation I made below:
        
        In general, when you want to think about coherence without getting deeply confused, you need to keep track of what reward function you are using to rule on your coherency criterion. I don’t see that fact mentioned often on this forum, so I will expand.
        
        An agent that plans coherently given a reward function $R_{p}$ to maximize paperclips will be an incoherent planner if you judge its actions by a reward function $R_{s}$ that values the maximization of staples instead.
        
        To extend this to Dutch booking: if you train a superintelligent poker playing agent with a reward function that rewards it for losing at poker, you will find that if can be Dutch booked rather easily, if your Dutch booking test is whether you can find a counter-strategy to make it loose money.
    - Andrew McKnight 24 Nov 2021 22:11 UTC
      2 points
      AF Parent
      I haven’t read your papers but your proposal seems like it would scale up until the point when the AGI looks at itself. If it can’t learn at this point then I find it hard to believe it’s generally capable, and if it can, it will have incentive to simply remove the device or create a copy of itself that is correct about its own world model. Do you address this in the articles?
      
      On the other hand, this made me curious about what we could do with an advanced model that is instructed to not learn and also whether we can even define and ensure a model stops learning.
      - Koen.Holtman 25 Nov 2021 19:07 UTC
        LW: 2 AF: 2
        AF Parent
        
        I haven’t read your papers but your proposal seems like it would scale up until the point when the AGI looks at itself. [...] Do you address this in the articles?
        
        Yes I address this, see for example the part about The possibility of learned self-knowledge in the sequence. I show there that any RL agent, even a non-AGI, will always have the latent ability to ‘look at itself’ and create a machine-learned model of its compute core internals.
        
        What is done with this latent ability is up to the designer. The key thing here is that you have a choice as a designer, you can decide if you want to design an agent which indeed uses this latent ability to ‘look at itself’.
        
        Once you decide that you don’t want to use this latent ability, certain safety/corrigibility problems become a lot more tractable.
        
        Wikipedia has the following definition of AGI:
        
        Artificial general intelligence (AGI) is the hypothetical ability of an intelligent agent to understand or learn any intellectual task that a human being can.
        
        Though there is plenty of discussion on this forum which silently assumes otherwise, there is no law of nature which says that, when I build a useful AGI-level AI, I must necessarily create the entire package of all human cognitive abilities inside of it.
        
        this made me curious about what we could do with an advanced model that is instructed to not learn and also whether we can even define and ensure a model stops learning.
        
        Terminology note if you want to look into this some more: ML typically does not frame this goal as ‘instructing the model not to learn about Q’. ML would frame this as ‘building the model to approximate the specific relation $P (X | Y, Z)$ between some well-defined observables, and this relation is definitely not Q’.
- Gurkenglas 19 Nov 2021 18:12 UTC
  2 points
  AF Parent
  If you don’t wish to reply to Eliezer, I’m an other and also ask what incoherence allows what corrigibility. I expect counterfactual planning to fail for want of basic interpretability. It would also coherently plan about the planning world—my Eliezer says we might as well equivalently assume superintelligent musings about agency to drive human readers mad.
  - Koen.Holtman 19 Nov 2021 21:07 UTC
    LW: 3 AF: 2
    AF Parent
    See above for my reply to Eliezer.
    
    Indeed, a counterfactual planner will plan coherently inside its planning world.
    
    In general, when you want to think about coherence without getting deeply confused, you need to keep track of what reward function you are using to rule on your coherency criterion. I don’t see that fact mentioned often on this forum, so I will expand.
    
    An agent that plans coherently given a reward function $R_{p}$ to maximize paperclips will be an incoherent planner if you judge its actions by a reward function $R_{s}$ that values the maximization of staples instead. In section 6.3 of the paper I show that you can perfectly well interpret a counterfactual planner as an agent that plans coherently even inside its learning world (inside the real world), as long as you are willing to evaluate its coherency according to the somewhat strange reward function $R^{π}$ . Armstrong’s indifference methods use this approach to create corrigibility without losing coherency: they construct an equivalent somewhat strange reward function by including balancing terms.
    
    One thing I like about counterfactual planning is that, in my view, it is very interpretable to humans. Humans are very good at predicting what other humans will do, when these other humans are planning coherently inside a specifically incorrect world model, for example in a world model where global warming is a hoax. The same skill can also be applied to interpreting and anticipating the actions of AIs which are counterfactual planners. But maybe I am misunderstanding your concern about interpretability.
    - Gurkenglas 19 Nov 2021 21:18 UTC
      LW: 2 AF: 1
      AF Parent
      it is very interpretable to humans
      Misunderstanding: I expect we can’t construct a counterfactual planner because we can’t pick out the compute core in the black-box learned model.
      And my Eliezer’s problem with counterfactual planning is that the plan may start by unleashing a dozen memetic, biological, technological, magical, political and/or untyped existential hazards on the world which then may not even be coordinated correctly when one of your safeguards takes out one of the resulting silicon entities.
      - Koen.Holtman 19 Nov 2021 22:39 UTC
        LW: 7 AF: 3
        AF Parent
        
        we can’t pick out the compute core in the black-box learned model.
        
        Agree it is hard to pick the compute core out of a black-box learned model that includes the compute core.
        
        But one important point I am trying to make in the counterfactual planning sequence/paper is that you do not have to solve that problem. I show that it is tractable to route around it, and still get an AGI.
        
        I don’t understand your second paragraph ‘And my Eliezer’s problem...’. Can you unpack this a bit more? Do you mean that counterfactual planning does not automatically solve the problem of cleaning up an already in-progress mess when you press the emergency stop button too late? It does not intend to, and I do not think that the cleanup issue is among the corrigibility-related problems Eliezer has been emphasizing in the discussion above.
        Gurkenglas 20 Nov 2021 0:11 UTC
        LW: 2 AF: 1
        AF Parent
        Oh, I wasn’t expecting you to have addressed the issue! 10.2.4 says L wouldn’t be S if it were calculated from projected actions instead of given actions. How so? Mightn’t it predict the given actions correctly?
        You’re right on all counts in your last paragraph.
        Koen.Holtman 22 Nov 2021 16:02 UTC
        LW: 1 AF: 1
        AF Parent
        
        10.2.4 says L wouldn’t be S if it were calculated from projected actions instead of given actions. How so? Mightn’t it predict the given actions correctly?
        
        Not sure if a short answer will help, so I will write a long one.
        
        In 10.2.4 I talk about the possibility of an unwanted learned predictive function $L^{-} (s^{'}, s, a)$ that makes predictions without using the argument $a$ . This is possible for example by using $s^{'}$ together with a (learned) model $π^{l}$ of the compute core to predict $a$ : so a viable $L^{-}$ could be defined as $L^{-} (s^{'}, s, a) = S (s^{'}, s, π^{l} (s))$ . This $L^{-}$ could make predictions fully compatible with the observational record $o$ , but I claim it would not be a reasonable learned $L$ according to the reasonableness criterion $L \approx S$ . How so?
        
        The reasonableness criterion $L \approx S$ is similar to that used in supervised machine learning: we evaluate the learned $L$ not primarily by how it matches the training set (how well it predicts the observations in $o$ ), but by evaluating it on a separate test set. This test set can be constructed by sampling $S$ to create samples not contained in $o$ . Mathematically, perfect reasonableness is defined as $L = S$ , which implies that $L$ predicts all samples from $S$ fully accurately.
        
        Philosophically/ontologically speaking, an the agent specification in my paper, specifically the learning world diagram and the descriptive text around it of how this diagram is a model of reality, gives the engineer an unambiguous prescription of how they might build experimental equipment that can measure the properties of the $S$ in the learning world diagram by sampling reality. A version of this equipment must of course be built into the agent, to create the observations that drive machine learning of $L$ , but another version can be used stand-alone to construct a test set.
        
        A sampling action to construct a member of the test set would set up a desired state $s$ and action $a$ , and then observe the resulting $s^{'}$ . Mathematically speaking, this observation gives additional information about the numeric value of $S (s^{'}, s, a)$ and of all $S (s^{''}, s, a)$ for all $s^{''} \neq s^{'}$ .
        
        I discuss in the section that, if we take an observational record $o$ sampled from $S$ , then two learned predictive functions $L_{1}$ and $L_{2}$ could be found which are both fully compatible with all observations in $o$ . So to determine which one might be a more reasonable approximation of $S$ , we can see how well they would each predict samples not yet in $o$ .
        
        In the case of section 10.2.4, the crucial experimental test showing that $L^{-}$ is an unreasonable approximation of $S$ is one where we create a test set by setting up an $s_{t}$ and an $a_{t}$ where we know that $a_{t}$ is an action that would definitely not be taken by the real compute core software running in the agent, when it it encounters state $s_{t}$ . So we set up a test where we expect that $a_{t} \neq π^{l} (s_{t})$ . $L^{-}$ will (likely) mis-predict the outcome of this test. In philosophical/ontological terms, you can read this test as one that (likely) falsifies the claim that $L^{-}$ is a correct theory of $S$ .
        
        As discussed in section 10.2.4, there are parallels between the above rejection test and the idea of random exploration, where random exploration causes the observational record $o$ , the training set, to already contain observations where $a_{t} \neq π^{l} (s_{t})$ for any deterministic $π^{l}$ . So this will likely suppress the creation of an unwanted $L^{-}$ via machine learning.
        
        Some background: the symbol grounding issue I discuss in 10.2.4 is very related to the five-and-ten problem you can find in MIRI’s work on embedded agency. In my experience, most people in AI, robotics, statistics, or cyber-physical systems have no problem seeing the solution to this five-and-ten problem, i.e. how to construct an agent that avoids it But somehow, and I do not know exactly why, MIRI-style(?) Rationalists keep treating it as a major open philosophical problem that is ignored by the mainstream AI/academic community. So you can read section 10.2.4 as my attempt to review and explain the standard solution to the five-and-ten problem, as used in statistics and engineering. The section was partly written with Rationalist readers in mind.
        
        Philosophically speaking, the reasonableness criterion defined in my paper, and by supervised machine learning, has strong ties to Popper’s view of science and engineering, which emphasizes falsification via new experiments as the key method for deciding between competing theories about the nature of reality. I believe that MIRI-style rationality de-emphasizes the conceptual tools provided by Popper. Instead it emphasizes a version of Bayesianism that provides a much more limited vocabulary to reason about differences between the map and the territory.
        
        I would be interested to know if the above explanation was helpful to you, and if so which parts.