Andrew_Critch

Karma: 4,849

This is Dr. Andrew Critch’s professional LessWrong account. Andrew is the CEO of Encultured AI, and works for ~1 day/week as a Research Scientist at the Center for Human-Compatible AI (CHAI) at UC Berkeley. He also spends around a ½ day per week volunteering for other projects like the Berkeley Existential Risk initiative and the Survival and Flourishing Fund. Andrew earned his Ph.D. in mathematics at UC Berkeley studying applications of algebraic geometry to machine learning models. During that time, he cofounded the Center for Applied Rationality and SPARC. Dr. Critch has been offered university faculty and research positions in mathematics, mathematical biosciences, and philosophy, worked as an algorithmic stock trader at Jane Street Capital’s New York City office, and as a Research Fellow at the Machine Intelligence Research Institute. His current research interests include logical uncertainty, open source game theory, and mitigating race dynamics between companies and nations in AI development.

Andrew_Critch Jan 1, 2023, 6:01 PM
LW: 2 AF: 1
0
AF
on: Löb’s Theorem for implicit reasoning in natural language: Löbian party invitations
Hat tip to Ben Pace for pointing out that invitations are often self-referential, such as when people say “You are hereby invited”, because “hereby” means “by this utterance”:
https://www.lesswrong.com/posts/rrpnEDpLPxsmmsLzs/open-technical-problem-a-quinean-proof-of-loeb-s-theorem-for?commentId=CFvfaWGzJjnMP8FCa

That comment was like 25% of my inspiration for this post :)

Andrew_Critch Dec 30, 2022, 8:15 AM
LW: 2 AF: 1
0
AF
in reply to: quetzal_rainbow’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
I’ve now fleshed out the notation section to elaborate on this a bit. Is it better now?
In short, $⊢$ is our symbol for talking about what PA can prove, and $□$ is shorthand for PA’s symbols for talking about what (a copy of) PA can prove.
- “ $⊢$ 1+1=2” means “Peano Arithmetic (PA) can prove that 1+1=2”. No parentheses are needed; the “ $⊢$ ” applies to the whole line that follows it. Also, $⊢$ does not stand for an expression in PA; it’s a symbol we use to talk about what PA can prove.
- “ $□ (1+1=2)$ ” basically means the same thing. More precisely, it stands for a numerical expression within PA that can be translated as saying ” $⊢$ 1+1=2″. This translation is possible because of something called a Gödel numbering which allows PA to talk about a (numerically encoded version of) itself.
- ″ $□ X$ ” is short for “ $□ (X)$ ” in cases where “ $X$ ” is just a single character of text.
- “ $X ⊢ Y$ ” means “PA, along with X as an additional axiom/assumption, can prove Y”. In this post we don’t have any analogous notation for $□$ .

Andrew_Critch Dec 30, 2022, 7:35 AM
LW: 4 AF: 2
0
AF
in reply to: tristanhaze’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
Well, the deduction theorem is a fact about PA (and, propositional logic), so it’s okay to use as long as $⊢$ means “PA can prove”.

But you’re right that it doesn’t mix seamlessly with the (outer) necessitation rule. Necessitation is a property of “ $⊢$ ”, but not generally a property of “ $X ⊢$ ”. When PA can prove something, it can prove that it can prove it. By contrast, if PA+X can prove Y, that does mean that PA can prove that PA+X can prove Y (because PA alone can work through proofs in a Gödel encoding), but it doesn’t mean that PA+X can prove that PA can prove Y. This can be seen by example, by setting $X = Y = \neg □ (1 = 0)$ ”.
As for the case where you want $⊢$ to refer to K or S5 instead of PA provability, those logics are still built on propositional logic, for which the deduction theorem does hold. So if you do the deduction only using propositional logic from theorems in $⊢$ along with an additional assumption X, then the deduction theorem applies. In particular, inner necessitation and box distributivity are both theorems of $⊢$ for every $A$ and $B$ you stick into them (rather than meta theorems about $⊢$ , which is what necessitation is). So the application of the deduction theorem here is still valid.
Still, the deduction theorem isn’t safe to just use willy nilly along with the (outer) necessitation rule, so I’ve just added a caveat about that:
Note that from $X ⊢ A$ we cannot conclude $X ⊢ □ A$ , because $□$ still means “PA can prove”, and not “PA+X can prove”.
Thanks for calling this out.

Andrew_Critch Dec 30, 2022, 7:11 AM
LW: 3 AF: 1
0
AF
in reply to: Gurkenglas’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
Well, $A \to B$ is just short for $\neg A \lor B$ , i.e., “(not A) or B”. By contrast, $A ⊢ B$ means that there exists a sequence of (very mechanical) applications of modus ponens, starting from the axioms of Peano Arithmetic (PA) with $A$ appended, ending in $B$ . We tried hard to make the rules of $⊢$ so that it would agree with $\to$ in a lot of cases (i.e., we tried to design $⊢$ to make the deduction theorem true), but it took a lot of work in the design of Peano Arithmetic and can’t be taken for granted.
For instance, consider the statement $\neg □ (1 = 0)$ . If you believe Peano Arithmetic is consistent, then you believe that $\neg □ (1 = 0)$ , and therefore you also believe that $□ (1 = 0) \to (2 = 3)$ . But PA cannot prove that $\neg □ (1 = 0)$ (by Gödel’s Theorem, or Löb’s theorem with $p = (1 = 0)$ ), so we don’t have $⊢ □ (1 = 0) \to (2 = 3)$ .

Andrew_Critch Dec 30, 2022, 6:45 AM
LW: 4 AF: 2
0
AF
in reply to: philh’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
It’s true that the deduction theorem is not needed, as in the Wikipedia proof. I just like using the deduction theorem because I find it intuitive (assume $A$ , prove $B$ , then drop the assumption and conclude $A \to B$ ) and it removes the need for lots of parentheses everywhere.

I’ll add a note about the meaning of $⊢$ so folks don’t need to look it up, thanks for the feedback!

Andrew_Critch Dec 27, 2022, 8:36 PM
2 points
0
in reply to: habryka’s comment on: Let’s think about slowing down AI
It did not get accross! Interesting. Procedurally I still object to calling people’s arguments “crazy”, but selfishly I guess I’m glad they were not my arguments? At a meta level though I’m still concerned that LessWrong culture is too quick to write off views as “crazy”. Even the the “coordination is delusional”-type views that Katja highlights in her post do not seem “crazy” to me, more like misguided or scarred or something, in a way that warrants a closer look but not being called “crazy”.

Andrew_Critch Dec 26, 2022, 5:14 PM
2 points
0
in reply to: habryka’s comment on: Let’s think about slowing down AI
Oliver, see also this comment; I tried to @ you on it, but I don’t think LessWrong has that functionality?

Andrew_Critch Dec 26, 2022, 5:09 PM
2 points
0
in reply to: Daniel Kokotajlo’s comment on: Let’s think about slowing down AI
Separately from my other reply explaining that you are not the source of what I’m complaining about here, I thought I’d add more color to explain why I think my assessment here is not “hyperbolic”. Specifically, regarding your claim that reducing AI x-risk through coordination is “not only fine to suggest, but completely uncontroversial accepted wisdom”, please see the OP. Perhaps you have not witnessed such conversations yourself, but I have been party to many of these:
Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
Others: wow that sounds extremely ambitious
Some people: yeah but it’s very important and also we are extremely smart so idk it could work
[Work on it for a decade and a half]
Some people: ok that’s pretty hard, we give up
Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional
In other words, I’ve seen people in AI governance being called or treated as “delusional” by loads of people (1-2 dozen?) core to the LessWrong community (not you). I wouldn’t say by a majority, but by an influential minority to say the least, and by more people than would be fair to call “just institution X” for any X, or “just person Y and their friends” for any Y. The pattern is strong enough that for me, pointing to governance as an approach to existential safety on LessWrong indeed feels fraught, because influential people (online or offline) will respond to the idea as “delusional” as Katja puts it. Being called delusional is stressful, and hence “fraught”.
@Oliver, the same goes for your way of referring to sentences you disagree with as “crazy”, such as here.
Generally speaking, on the LessWrong blog itself I’ve observed too many instances of people using insults in response to dissenting views on the epistemic health of the LessWrong community, and receiving applause and karma for doing so, for me to think that there’s not a pattern or problem here.

That’s not to say I think LessWrong has this problem worse than other online communities (i.e., using insults or treating people as ‘crazy’ or ‘delusional’ for dissenting or questioning the status quo); only that I think it’s a problem worth addressing, and a problem I see strongly at play on the topic of coordination and governance.
What links here?
- Andrew_Critch's comment on Let’s think about slowing down AI by KatjaGrace (Dec 26, 2022, 5:14 PM; 2 points)

Andrew_Critch Dec 26, 2022, 5:12 AM
7 points
6
in reply to: habryka’s comment on: Let’s think about slowing down AI
Thanks, Oliver. The biggest update for me here — which made your entire comment worth reading, for me — was that you said this:
I also think it’s really not true that coordination has been “fraught to even suggest”.
I’m surprised that you think that, but have updated on your statement at face value that you in fact do. By contrast, my experience around a bunch common acquaintances of ours has been much the same as Katja’s, like this:
Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
Others: wow that sounds extremely ambitious
Some people: yeah but it’s very important and also we are extremely smart so idk it could work
[Work on it for a decade and a half]
Some people: ok that’s pretty hard, we give up
Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional

In fact I think I may have even heard the world “delusional” specifically applied to people working on AI governance (though not by you) for thinking that coordination on AI regulation is possible / valuable / worth pursuing in service of existential safety.

As for the rest of your narrative of what’s been happening in the world, to me it seems like a random mix of statements that are clearly correct (e.g., trying to coordinate with people who don’t care about honestly or integrity will get you screwed) and other statements that seem, as you say,
pretty crazy to me,
and I agree that for the purpose of syncing world models,
I don’t [think] this comment thread is the right context.
Anyway, cheers for giving me some insight into your thinking here.

Andrew_Critch Dec 24, 2022, 11:41 PM
9 points
2
in reply to: Daniel Kokotajlo’s comment on: Let’s think about slowing down AI
This makes sense to me if you feel my comment is meant as a description of you or people-like-you. It is not, and quite the opposite. As I see it, you are not a representative member of the LessWrong community, or at least, not a representative source of the problem I’m trying to point at. For one thing, you are willing to work for OpenAI, which many (dozens of) LessWrong-adjacent people I’ve personally met would consider a betrayal of allegiance to “the community”. Needless to say, the field of AI governance as it exists is not uncontroversially accepted by the people I am reacting to with the above complaint. In fact, I had you in mind as a person I wanted to defend by writing the complaint, because you’re willing to engage and work full-time (seemingly) in good faith with people who do not share many of the most centrally held views of “the community” in question, be it LessWrong, Effective Altruism, or the rationality community.
If it felt otherwise to you, I apologize.
What links here?
- Andrew_Critch's comment on Let’s think about slowing down AI by KatjaGrace (Dec 26, 2022, 5:09 PM; 2 points)

Andrew_Critch Dec 23, 2022, 11:43 PM
91 points
45
on: Let’s think about slowing down AI
Katja, many thanks for writing this, and Oliver, thanks for this comment pointing out that everyday people are in fact worried about AI x-risk. Since around 2017 when I left MIRI to rejoin academia, I have been trying continually to point out that everyday people are able to easily understand the case for AI x-risk, and that it’s incorrect to assume the existence of AI x-risk can only be understood by a very small and select group of people. My arguments have often been basically the same as yours here: in my case, informal conversations with Uber drivers, random academics, and people at random public social events. Plus, the argument is very simple: If things are smarter than us, they can outsmart us and cause us trouble. It’s always seemed strange to say there’s an “inferential gap” of substance here.
However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest. For instance, I tried to point it out in this previous post:
“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
I wrote the following, targeting multiple LessWrong-adjacent groups in the EA/rationality communities who thought “pivotal acts” with AGI were the only sensible way to reduce AI x-risk:
In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.
That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,
What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?
I’m not sure what’s going on here, but it seems to me like the idea of coordinating with “outsiders” or placing trust or hope in judgement of “outsiders” has been a bit of taboo here, and that arguments that outsiders are dumb or wrong or can’t be trusted will reliably get a lot of cheering in the form of Karma.
Thankfully, it now also seems to me that perhaps the core LessWrong team has started to think that ideas from outsiders matter more to the LessWrong community’s epistemics and/or ability to get things done than previously represented, such as by including material written outside LessWrong in the 2021 LessWrong review posted just a few weeks ago, for the first time:
https://www.lesswrong.com/posts/qCc7tm29Guhz6mtf7/the-lesswrong-2021-review-intellectual-circle-expansion
I consider this a move in a positive direction, but I am wondering if I can draw the LessWrong team’s attention to a more serious trend here. @Oliver, @Raemon, @Ruby, and @Ben Pace, and others engaged in curating and fostering intellectual progress on LessWrong:
Could it be that the LessWrong community, or the EA community, or the rationality community, has systematically discounted the opinions and/or agency of people outside that community, in a way that has lead the community to plan more drastic actions in the world than would otherwise be reasonable if outsiders of that community could also be counted upon to take reasonable actions?
This is a leading question, and my gut and deliberate reasoning have both been screaming “yes” at me for about 5 or 6 years straight, but I am trying to get you guys to take a fresh look at this hypothesis now, in question-form. Thanks in any case for considering it.
What links here?
- Andrew_Critch's comment on Let’s think about slowing down AI by KatjaGrace (Dec 24, 2022, 11:41 PM; 9 points)

Andrew_Critch Dec 23, 2022, 4:27 PM
LW: 3 AF: 2
1
AF
in reply to: DanielFilan’s comment on: «Boundaries», Part 3b: Alignment problems in terms of boundaries
I agree this is a big factor, and might be the main pathway through which people end up believing what people believe the believe. If I had to guess, I’d guess you’re right.

E.g., if there’s a evidence E in favor of H and evidence E’ against H, if the group is really into thinking about and talking about E as a topic, then the group will probably end up believing H too much.
I think it would be great if you or someone wrote a post about this (or whatever you meant by your comment) and pointed to some examples. I think the LessWrong community is somewhat plagued by attentional bias leading to collective epistemic blind spots. (Not necessarily more than other communities; just different blind spots.)

Andrew_Critch Dec 23, 2022, 4:21 PM
LW: 2 AF: 1
0
AF
in reply to: DanielFilan’s comment on: «Boundaries», Part 3b: Alignment problems in terms of boundaries
Ah, thanks for the correction! I’ve removed that statement about “integrity for consequentialists” now.

Andrew_Critch Dec 9, 2022, 6:25 AM
11 points
0
in reply to: Andrew_Critch’s comment on: On the Diplomacy AI
I’ve searched my memory for the past day or so, and I just wanted to confirm that the “ever” part of my previous message was not a hot take or exaggeration.
I’m not sure what to do about this. I am mulling.

Andrew_Critch Dec 7, 2022, 9:46 PM
11 points
2
on: On the Diplomacy AI
This piece of news is the most depressing thing I’ve seen in AI since… I don’t know, ever? It’s not like the algorithms for doing this weren’t lying around already. The depressing thing for me is that it was promoted as something to be proud of, with no regard for the framing implication that cooperative discourse exists primarily in service of forming alliances to exterminate enemies.

Andrew_Critch Dec 7, 2022, 5:21 PM
LW: 3 AF: 2
1
AF
in reply to: davidad’s comment on: Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
Thanks for raising this! I assume you’re talking about this part?
They explore a pretty interesting set-up, but they don’t avoid the narrowly-self-referential sentence Ψ:
So, I don’t think their motivation was the same as mine. For me, the point of trying to use a quine is to try to get away from that sentence, to create a different perspective on the foundations for people that find that kind of sentence confusing, but who find self-referential documents less confusing. I added a section “Further meta-motivation (added Nov 26)” about this in the OP.

Andrew_Critch Nov 27, 2022, 2:33 AM
LW: 2 AF: 1
0
AF
in reply to: Gurkenglas’s comment on: Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
Noice :)

Andrew_Critch Nov 27, 2022, 2:22 AM
LW: 3 AF: 1
0
AF
in reply to: So8res’s comment on: Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
At this point I’m more interested in hashing out approaches that might actually conform to the motivation in the OP. Perhaps I’ll come back to this discussion with you after I’ve spent a lot more time in a mode of searching for a positive result that fits with my motivation here. Meanwhile, thanks for thinking this over for a bit.

Andrew_Critch Nov 27, 2022, 2:02 AM
LW: 2 AF: 1
0
AF
in reply to: Ben Pace’s comment on: Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
True! “Hereby” covers a solid contingent of self-referential sentences. I wonder if there’s a “hereby” construction that would make the self-referential sentence Ψ (from the Wikipedia poof) more common-sense-meaningful to, say, lawyers.

Andrew_Critch Nov 26, 2022, 4:31 AM
LW: 5 AF: 2
0
AF
in reply to: So8res’s comment on: Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
this suggests that you’re going to be hard-pressed to do any self-reference without routing through the nomal machinery of löb’s theorem, in the same way that it’s hard to do recursion in the lambda calculus without routing through the Y combinator
If by “the normal machinery”, you mean a clever application of the diagonal lemma, then I agree. But I think we can get away with not having the self-referential sentence, by using the same y-combinator-like diagonal-lemma machinery to make a proof that refers to itself (instead of a proof about sentences that refer to themselves) and checks its own validity. I think I if or someone else produces a valid proof like that, skeptics of its value (of which you might be one; I’m not sure) will look at it and say “That was harder and less efficient than the usual way of proving Löb using the self-referential sentence $Ψ$ and no self-validation”. I predict I’ll agree with that, and still find the new proof to be of additional intellectual value, for the following reason:
- Human documents tend to refer to the themselves a lot, like bylaws.
- Human sentences, on the other hand, rarely refer to themselves. (This sentence is an exception, but there aren’t a lot of naturally occurring examples.)
- Therefore, a proof of Löb whose main use of self-references involves the entire proof referring to itself, rather than a sing sentence referring to itself, will be more intuitive to humans (such as lawyers) who are used to thinking about self-referential documents.
The skeptic response to that will be to say that those peoples’ intuitions are the wrong way to think about y-combinator manipulation, and to that I’ll be like “Maybe, but I’m not super convinced their perspective is wrong, and in any case I don’t mind meeting them where they’re at, using a proof that they find more intuitive.”.
Summary: I’m pretty confident the proof will be valuable, even though I agree it will have to use much of the same machinery as the usual proof, plus some extra machinery for helping the proof to self-validate, as long as the proof doesn’t use sentences that are basically only about their own meaning (the way the sentence $Ψ$ is basically only about its own relationship to the sentence C, which is weird).