An Untrollable Mathematician Illustrated

abramdemskiMar 20, 2018, 12:00 AM

LW: 165 AF: 37

Logical Uncertainty AI Logic & Mathematics Art Machine Intelligence Research Institute (MIRI)

The following was a presentation I made for Sören Elverlin’s AI Safety Reading Group. I decided to draw everything by hand because powerpoint is boring. Thanks to Ben Pace for formatting it for LW! See also the IAF post detailing the research which this presentation is based on.

What links here?

abramdemskiMar 20, 2018, 12:00 AM

LW: 165 AF: 37

38 comments1 min readLW link 1 review

Logical Uncertainty AI Logic & Mathematics Art Machine Intelligence Research Institute (MIRI)

habryka Nov 29, 2019, 8:53 PM
LW: 16 AF: 6
AF

I think this post, together with Abram’s other post “Towards a new technical explanation” actually convinced me that a bayesian approach to epistemology can’t work in an embedded context, which was a really big shift for me.
TurnTrout Nov 22, 2019, 5:01 PM
LW: 14 AF: 5
AF

Abram’s writing and illustrations often distill technical insights into accessible, fun adventures. I’ve come to appreciate the importance and value of this expository style more and more over the last year, and this post is what first put me on this track. While more rigorous communication certainly has its place, clearly communicating the key conceptual insights behind a piece of work makes those insights available to the entire community.

Jameson Quinn Jan 10, 2020, 11:23 PM
7 points

This is truly one of the best posts I’ve read. It guides the reader through a complex argument in a way that’s engaging and inspiring. Great job.

Qiaochu_Yuan Mar 18, 2018, 3:02 AM
30 points

This is great. I consistently keep wanting to read the title as “Uncontrollable Mathematician,” which I’m excited about as a band name.
- ryan_b Mar 19, 2018, 6:24 PM
  8 points
  Parent
  
  Perhaps if we specify a different rule that only solves the problem of updating on negative information, you could have an “Un-Con-Trollable Mathematician.”
nostalgebraist Nov 4, 2018, 6:06 PM
23 points

This prior isn’t trollable in the original sense, but it is trollable in a weaker sense that still strikes me as important. Since $μ$ must sum to 1, only finitely many sentences $S$ can have $μ (S) > ϵ$ for a given $ϵ > 0$ . So we can choose some finite set of “important sentences” and control their oscillations in a practical sense, but if there’s any $ϵ > 0$ such that we think oscillations across the range $(ϵ, 1 - ϵ)$ are a bad thing, all but finitely many sentences can exhibit this bad behavior.
It seems especially bad that we can only prevent “up-to- $ϵ$ trolling” for finite sets of sentences, since in PA (or whatever) there are plenty of countable sets of sentences that seem “essentially the same” (like the ones you get from an induction argument), and it feels very unnatural to choose finite subsets of these and distinguish them from the others, even (or especially?) if we pretend we have no prior knowledge beyond the axioms.
Hazard Mar 16, 2018, 10:38 PM
21 points

This was incredibly enjoyable to read! I think you did a very good job of making it easy to read without dumbing it down. Though I’m not well versed in the core math of this post, I still feel like I managed to get some useful gist from it, and I also don’t feel like I’ve been tricked into thinking I understand more than I do.
Ben Pace Mar 16, 2018, 9:38 PM
12 points

(this is so awesome and it helps give me intuitions about Gödel’s theorem and how mathematics happens and stuff)
I didn’t parse the final sentence?
Logical induction (which is untrollable but not exactly a Bayesian probability distribution) is still the gold standard for logical uncertainty, but perhaps the number of desirable properties we can get by specifying simple sampling processes.
It feels like it should say ‘but perhaps the number of desirable properties we can get by specifying simple sampling processes is X’ but is missing the final clause, or something.
Edit: This has been fixed now :-)
- abramdemski Mar 16, 2018, 9:42 PM
  6 points
  Parent
  
  Right, whoops.
  It should have said ”… by specifying simple sampling processes will increase as we push further in the direction Sam has opened up.”
  - DanielFilan Mar 16, 2018, 10:44 PM
    6 points
    Parent
    
    Further bug: I can now see both the old final image and the new final image.
    - abramdemski Mar 16, 2018, 10:47 PM
      3 points
      Parent
      
      Wow, that’s weird, I **don’t** see both when I try to edit the draft. Only in the non-editing view.
      - habryka Mar 16, 2018, 11:19 PM
        6 points
        Parent
        
        Sorry for that, fixed it!
      - Ben Pace Mar 16, 2018, 11:13 PM
        3 points
        Parent
        
        Wow. Oli’s on it.
Ben Pace Mar 31, 2018, 11:59 PM
9 points

I curated this post because:
- The explanation itself was very clear—a serious effort had been made to explain this work and related ideas.
- In the course of explaining a single result, it helps give strong intuitions about a wide variety of related areas in math and logic, which are very important for alignment research.
- It was really fun to read; the drawings are very beautiful.
Biggest hesitations I had with curating:
- It wasn’t clear to me that the main argument the post makes regarding the untrollable mathematicians is itself a huge result in agent foundations research.
This wasn’t a big factor for me though, as just making transparent all of the mental moves in achieving this result helps the reader with seeing / learning the mental models used throughout this research area.
hwold Apr 13, 2018, 7:49 AM
7 points

I don’t understand where that ¹⁄₂ comes from. Unless I have made a gross mistake P(A|A ⇒ B) < P(A) even if P(A&B) > P(A&not(B)). In your first example, if I swap P(AB) and P(A&not(B)) so that P(AB) = .5 and P(A&not(B))=.3 then P(A|A=>B) = .5/.7 ~ 0.71 < 0.8 = P(A).
- Chris_Leong Dec 14, 2018, 10:50 PM
  5 points
  1
  Parent
  
  This confused me as well. This being true ensures that the ratio P(A):P(not A) doubles at each step. But part of this comic seems to imply that being less than a half stops the trolling, when it should only stop the trolling from proceeding at such a fast-paced rate.
rk Mar 29, 2018, 12:45 PM
7 points

I want to echo the other comments thanking you for making this lay-approachable and for the fun format!
I do find myself confused by some of the statements though. It may be that I have a root misunderstanding or that I am misreading some of the more quickly stated sentences.
For example, when you talk about the trees of truth & falsehood and the gap in the middle: am I right in thinking of these trees as provability and non-provability? Rather than perhaps truth & falsehood
Also, in the existence proof for Bs such that $P (B | A) > \frac{1}{2}$ and we can prove $A \to B$ , you say that if B is a logical truth, A → B must be provable, because anything implies a logical truth. It seems right to me that anything logically implies a logical truth. But surely we can’t prove all logical truths from anything—what if it’s a truth in the grey area such that it can’t be proved at all?
If someone can put me right, that would be great
- Dacyn Apr 2, 2018, 3:18 PM
  6 points
  Parent
  
  Yes you are right that the first tree is provability, but I think the second tree is meant to be disprovability rather than non-provability. Similarly, when the OP later talks about “logical truths” and “logical falsehoods” it seems he really means “provable statements” and “disprovable statements”—this should resolve the issue in your last paragraph, since if B is provable then so is A->B.
  - rk Apr 2, 2018, 4:40 PM
    1 point
    Parent
    
    
    disprovability rather than non-provability
    
    Yeah, you’re definitely right there. Oops, me.
    
    Similarly, when the OP later talks about “logical truths” and “logical falsehoods” it seems he really means “provable statements” and “disprovable statements”—this should resolve the issue in your last paragraph, since if B is provable then so is A->B
    
    If that’s the case, then how does Goedel kick in? He then says, nothing can separate logical truth from logical falsehood. But if he means provability and disprovability, then trivially they can be separated
    - Dacyn Apr 2, 2018, 9:23 PM
      4 points
      Parent
      
      Here “separation” would mean that there is an algorithm which inputs any statement and outputs either “yes” or “no”, such that the algorithm returns “yes” on all inputs that are provable statements and “no” on all inputs that are disprovable statements. But the algorithm also has to halt on all possible inputs, not just the ones that are provable or disprovable. Such a separation algorithm cannot exist (I am not sure if this follows from Gödel’s theorem or requires a separate diagonalization argument). This is the result needed in that step of the argument.
      - rk Apr 2, 2018, 9:26 PM
        1 point
        Parent
        
        Ah, so I was quite wrong when I said “trivially they can be separated”. Cos we only have semi-decision procedures for provability and disprovability!
        
        Thanks for helping me with this
cousin_it Mar 19, 2018, 8:29 AM
7 points

Great explanation! I read your earlier post on IAFF where the whole thing was explained in one sentence, and it was quite clear, but seeing it in pictures is much more fun. Maybe this is also why the Sequences were fun to read—they explained simple ideas but in a very fancy cursive font :-)
Connor_Flexman Mar 21, 2018, 12:50 AM
6 points

I am confused as to how the propositional consistency and $observe []$ function work together to prevent the trolling in the final step. Suppose I do try to find pairs of sentences such that I can show $(A \Rightarrow B_{i})$ and also $\neg B_{i}$ to drive $A$ down. Does this fail because you are postulating non-adversarial sampling, as ESRogs mentions? Or is there some other reason why propositional consistency is important here?
- Diffractor Mar 22, 2018, 1:03 AM
  10 points
  Parent
  
  There’s a misconception, it isn’t about finding sentences of the form $A \to B_{i}$ and $\neg B_{i}$ , because if you do that, it immediately disproves $A$ . It’s actually about merely finding many instances of $A \to B_{i}$ where $P (B_{i} | A)$ has $< \frac{1}{2}$ probability, and this lowers the probability of $A$ . This is kind of like how finding out about the Banach-Tarski paradox (something you assign low probability to) may lower your degree of belief in the axiom of choice.
  The particular thing that prevents trolling is that in this distribution, there’s a fixed probability of drawing $A$ on the next round no matter how many implications and $B$ ’s you’ve found so far. So the way it evades trolling is a bit cheaty, in a certain sense, because it believes that the sequence of truth or falsity of math sentences that it sees is drawn from a certain fixed distribution, and doesn’t do anything like believing that it’s more likely to see a certain class of sentences come up soon.
  - Ben Pace Mar 22, 2018, 1:10 AM
    5 points
    Parent
    
    (I fixed your LaTex. FYI whatever your comment looks like before you post, is what it will look like after. Use ctrl-4 or cmd-4 for LaTex, depending on whether you’re using a PC or a Mac.)
ESRogs Mar 18, 2018, 9:36 PM
6 points

Propositional consistency lets us express constraints between sentences (such as ” $A$ and $B$ cannot both be true”) as sentences (such as “ $\neg (A & B)$ ”) in a way the prior understands and correctly enforces.
Any branch contradicting an already-stated constraint is clipped off the tree of possible sequences of sentences.
The probability of any sentence $S$ which is consistent with everything seen so far can’t go below $μ (S)$ or above $1 - μ (\neg S)$ , since $S$ or $\neg S$ can be drawn next. So, no trolling.
How do I know whether $S$ is consistent with everything seen so far. Doesn’t that presuppose logical omniscience?
Or does consistency here only mean that it doesn’t violate any explicitly stated constraints (such that I don’t have to know all the implications of all the sentences I’ve seen so far and whether they contradict $S$ )?
- Diffractor Mar 22, 2018, 12:52 AM
  15 points
  Parent
  
  There’s a difference between “consistency” (it is impossible to derive X and notX for any sentence X, this requires a halting oracle to test, because there’s always more proof paths), and “propositional consistency”, which merely requires that there are no contradictions discoverable by boolean algebra only. So A^B is propositionally inconsistent with notA, and propositionally consistent with A. If there’s some clever way to prove that B implies notA, it wouldn’t affect the propositional consistency of them at all. Propositional consistency of a set of sentences can be verified in exponential time.
  - Chris_Leong Dec 14, 2018, 10:55 PM
    3 points
    Parent
    
    Since propositional consistency is weaker than consistency our prior may distribute some probability to cases that are contradictory. I guess that’s considered acceptable because the aim is to make the prior non-trollable, rather than good.
cata Mar 20, 2018, 9:14 PM
5 points

Thank you for making this! The format held my attention well, so I understood a lot whereas I might have zoned out and left if the same material had been presented more traditionally. I’m going to print it out and distribute it at work—people like zines there.
Dacyn Apr 2, 2018, 3:24 PM
4 points

Maybe I am not sure why this mathematician is considered to be untrollable? It seems the same or a similar algorithm could drive his probabilities up and down arbitrarily within the interval $[μ (S), 1 - μ (- S)]$ . If this is true, then his beliefs at any stage are essentially arbitrary with respect to this restriction. But isn’t that basically the same as saying that if the statement hasn’t been proven or disproven yet, then his beliefs don’t give any meaningful (non-trollable) further information as to whether the statement is true?
- Diffractor Apr 2, 2018, 5:46 PM
  3 points
  Parent
  
  The beliefs aren’t arbitrary, they’re still reasoning according to a probability distribution over propositionally consistent “worlds”. Furthermore, the beliefs converge to a single number in the limit of updating on theorems, even if the sentence of interest is unprovable. Consider some large but finite set S of sentences that haven’t been proved yet, such that the probability of sampling a sentence in that set before sampling the sentence of interest “x”, is very close to 1. Then pick a time N, that is large enough that by that time, all the logical relations between the sentences in S will have been found. Then, with probability very close to 1, either “x” or “notx” will be sampled without going outside of S.
  So, if there’s some cool new theorem that shows up relating “x” and some sentence outside of S, like “y->x”, well, you’re almost certain to hit either “x” or “notx” before hitting “y”, because “y” is outside S, so this hot new theorem won’t affect the probabilities by more than a negligible amount.
  Also I figured out how to generalize the prior a bit to take into account arbitrary constraints other than propositional consistency, though there’s still kinks to iron out in that one. Check this.
ESRogs Mar 18, 2018, 9:24 PM
4 points

Suppose nature is showing you true sentences one at a time. Model them as drawn randomly from a fixed distribution $μ (S)$ , but enforcing propositional consistency.
Does this mean nature has to in fact be showing me sentences sampled from this fixed distribution, or am I just pretending that that’s what it’s doing when I update my prior?
Does this work when sentences are shown to me in an adversarial order?
- Diffractor Mar 22, 2018, 1:06 AM
  10 points
  Parent
  
  You’re pretending that it’s what nature is doing what you update your prior. It works when sentences are shown to you in an adversarial order, but there’s the weird aspect that this prior expects the sentences to go back to being drawn from some fixed distribution afterwards. It doesn’t do a thing where it goes “ah, I’m seeing a bunch of blue blocks selectively revealed, even though I think there’s a bunch of red blocks, the next block I’ll have revealed will probably be blue”. Instead, it just sticks with its prior on red and blue blocks.
Elo Mar 17, 2018, 12:05 AM
3 points

Do the pictures load for other people? Because they don’t load for me.
- Elo Mar 17, 2018, 12:06 AM
  4 points
  Parent
  
  Oh wait. Just that browser.
jimrandomh May 9, 2018, 7:33 PM
2 points

Many (though not all) of the images are broken links right now. Could we get them re-uploaded somewhere else?
- Ben Pace May 10, 2018, 1:02 PM
  2 points
  Parent
  
  I just tried to fix that, and also the spacing issues. Let me know if it’s still broken.
romeostevensit Apr 1, 2018, 5:17 PM
2 points

Has trolling people into providing untrollable models been reused? Seems worth trying.