Andrew_Critch

Karma: 4,842

This is Dr. Andrew Critch’s professional LessWrong account. Andrew is the CEO of Encultured AI, and works for ~1 day/week as a Research Scientist at the Center for Human-Compatible AI (CHAI) at UC Berkeley. He also spends around a ½ day per week volunteering for other projects like the Berkeley Existential Risk initiative and the Survival and Flourishing Fund. Andrew earned his Ph.D. in mathematics at UC Berkeley studying applications of algebraic geometry to machine learning models. During that time, he cofounded the Center for Applied Rationality and SPARC. Dr. Critch has been offered university faculty and research positions in mathematics, mathematical biosciences, and philosophy, worked as an algorithmic stock trader at Jane Street Capital’s New York City office, and as a Research Fellow at the Machine Intelligence Research Institute. His current research interests include logical uncertainty, open source game theory, and mitigating race dynamics between companies and nations in AI development.

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

Andrew_CritchMay 24, 2023, 12:02 AM

268 points

39 comments8 min readLW link

Job Opening: SWE to help build signature vetting system for AI-related petitions

Ethan Ashkie and Andrew_Critch

May 20, 2023, 7:02 PM

52 points

0 comments1 min readLW link

GPT can write Quines now (GPT-4)

Andrew_CritchMar 14, 2023, 7:18 PM

112 points

30 comments1 min readLW link

Andrew_Critch Mar 6, 2023, 7:55 AM
LW: 7 AF: 3
0
AF
in reply to: Vladimir_Nesov’s comment on: Acausal normalcy
That is, norms do seem feasible to figure out, but not the kind of thing that is relevant right now, unfortunately.
From the OP:
for most real-world-prevalent perspectives on AI alignment, safety, and existential safety, acausal considerations are not particularly dominant [...]. In particular, I do not think acausal normalcy provides a solution to existential safety, nor does it undermine the importance of existential safety in some surprising way.
I.e., I agree.
we are so unprepared that the existing primordial norms are unlikely to matter for the process of settling our realm into a new equilibrium.
I also agree with that, as a statement about how we normal-everyday-humans seem quite likely to destroy ourselves with AI fairly soon. From the OP:
I strongly suspect that acausal norms are not so compelling that AI technologies would automatically discover and obey them. So, if your aim in reading this post was to find a comprehensive solution to AI safety, I’m sorry to say I don’t think you will find it here.

Andrew_Critch Mar 5, 2023, 10:37 PM
LW: 4 AF: 1
0
AF
in reply to: Duncan Sabien (Deactivated)’s comment on: Acausal normalcy
For 18 examples, just think of 3 common everyday norms having to do with each of the 6 boundaries given as example images in the post :) (I.e., cell membranes, skin, fences, social group boundaries, internet firewalls, and national borders). Each norm has the property that, when you reflect on it, it’s easy to imagine a lot of other people also reflecting on the same norm, because of the salience of the non-subjectively-defined actual-boundary-thing that the norm is about. That creates more of a Schelling-nature for that norm, relative to other norms, as I’ve argued somewhat in my «Boundaries» sequence.
Spelling out such examples more carefully in terms of the recursion described in 1 and 2 just prior is something I’ve been planning for a future post, so I will take this comment as encouragement to write it!

Andrew_Critch Mar 5, 2023, 10:30 PM
LW: 16 AF: 6
2
AF
in reply to: Wei Dai’s comment on: Acausal normalcy
To your first question, I’m not sure which particular “the reason” would be most helpful to convey. (To contrast: what’s “the reason” that physically dispersed human societies have laws? Answer: there’s a confluence of reasons.). However, I’ll try to point out some things that might be helpful to attend to.
First, committing to a policy that merges your utility function with someone else’s is quite a vulnerable maneuver, with a lot of boundary-setting aspects. For instance, will you merge utility functions multiplicatively (as in Nash bargaining), linearly (as in Harsanyi’s utility aggregation theorem), or some other way? Also, what if the entity you’re merging with has self-modified to become a “utility monster” (an entity with strongly exaggerated preferences) so as to exploit the merging procedure? Some kind of boundary-setting is needed to decide whether, how, and how much to merge, which is one of the reasons why I think boundary-handling is more fundamental than utility-handling.

Relatedly, Scott Garrabrant has pointed out in his sequence on geometric rationality that linear aggregation is more like not-having-a-boundary, and multiplicative aggregation is more like having-a-boundary:
https://www.lesswrong.com/posts/rc5ZKGjXTHs7wPjop/geometric-exploration-arithmetic-exploitation#The_AM_GM_Boundary
I view this as further pointing away from “just aggregate utilities” and toward “one needs to think about boundaries when aggregating beings” (see Part 1 of my Boundaries sequence). In other words, one needs (or implicitly assumes) some kind of norm about how and when to manage boundaries between utility functions, even in an abstract utility-function-merging operations where the boundary issues come down to where to draw parentheses in between additive and multiplicative operations. Thus, boundary-management are somewhat more fundamental, or conceptually upstream, of principles that might pick out a global utility function for the entirely of the “acausal society”.
(Even if the there is a global utility function that turns out to be very simple to write down, the process of verifying its agreeability will involve checking that a lot of boundary-interactions. For instance, one must check that this hypothetical reigning global utility function is not dethroned by some union of civilizations who successfully merge in opposition to it, which is a question of boundary-handling.)

Acausal normalcy

Andrew_CritchMar 3, 2023, 11:34 PM

195 points

36 comments8 min readLW link 1 review

Payor’s Lemma in Natural Language

Andrew_CritchMar 2, 2023, 12:22 PM

62 points

0 comments2 min readLW link

Andrew_Critch Feb 15, 2023, 7:37 AM
LW: 4 AF: 2
0
AF
in reply to: James Payor’s comment on: Modal Fixpoint Cooperation without Löb’s Theorem
This is cool (and fwiw to other readers) correct. I must reflect on what it means for real world cooperation… I especially like the A <-> []X → [][]X <-> []A trick.

Andrew_Critch Feb 15, 2023, 7:27 AM
LW: 4 AF: 2
0
AF
in reply to: orthonormal’s comment on: Modal Fixpoint Cooperation without Löb’s Theorem
I’m working on it :) At this point what I think is true is the following:

If ShortProof(x \leftrightarrow LongProof(ShortProof(x) \to x)), then MediumProof(x).

Apologies that I haven’t written out calculations very precisely yet, but since you asked, that’s roughly where I’m at :)

Andrew_Critch Feb 7, 2023, 6:24 AM
2 points
0
in reply to: tailcalled’s comment on: Modal Fixpoint Cooperation without Löb’s Theorem
Actually the interpretation of \Box_E as its own proof system only requires the other systems to be finite extenions of PA, but I should mention that requirement! Nonetheless even if they’re not finite, everything still works because \Box_E still satisfies necessitation, distributivity, and existence of modal fixed points.

Thanks for bringing this up.

Modal Fixpoint Cooperation without Löb’s Theorem

Andrew_CritchFeb 5, 2023, 12:58 AM

134 points

34 comments3 min readLW link 1 review

Löbian emotional processing of emergent cooperation: an example

Andrew_CritchJan 17, 2023, 5:59 AM

23 points

0 comments8 min readLW link

Andrew_Critch Jan 12, 2023, 4:45 PM
LW: 2 AF: 1
0
AF
on: Löb’s Theorem for implicit reasoning in natural language: Löbian party invitations
Based on a potential misreading of this post, I added the following caveat today:

Important Caveat: Arguments in natural language are basically never “theorems”. The main reason is that human thinking isn’t perfectly rational in virtually any precisely defined sense, so sometimes the hypotheses of an argument can hold while its conclusion remains unconvincing. Thus, the Löbian argument pattern of this post does not constitute a “theorem” about real-world humans: even when the hypotheses of the argument hold, the argument will not always play out like clockwork in the minds of real people. Nonetheless, Löb’s-Theorem-like arguments can play out relatively simply in the English language, and this post shows what would look like.

Andrew_Critch Jan 1, 2023, 8:19 PM
LW: 2 AF: 1
0
AF
in reply to: Ustice’s comment on: Löb’s Theorem for implicit reasoning in natural language: Löbian party invitations
Thanks! Added a note to the OP explaining that hereby means “by this utterance”.

Andrew_Critch Jan 1, 2023, 6:01 PM
LW: 2 AF: 1
0
AF
on: Löb’s Theorem for implicit reasoning in natural language: Löbian party invitations
Hat tip to Ben Pace for pointing out that invitations are often self-referential, such as when people say “You are hereby invited”, because “hereby” means “by this utterance”:
https://www.lesswrong.com/posts/rrpnEDpLPxsmmsLzs/open-technical-problem-a-quinean-proof-of-loeb-s-theorem-for?commentId=CFvfaWGzJjnMP8FCa

That comment was like 25% of my inspiration for this post :)

A Löbian argument pattern for implicit reasoning in natural language: Löbian party invitations

Andrew_CritchJan 1, 2023, 5:39 PM

23 points

8 comments7 min readLW link

Andrew_Critch Dec 30, 2022, 8:15 AM
LW: 2 AF: 1
0
AF
in reply to: quetzal_rainbow’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
I’ve now fleshed out the notation section to elaborate on this a bit. Is it better now?
In short, $⊢$ is our symbol for talking about what PA can prove, and $□$ is shorthand for PA’s symbols for talking about what (a copy of) PA can prove.
- “ $⊢$ 1+1=2” means “Peano Arithmetic (PA) can prove that 1+1=2”. No parentheses are needed; the “ $⊢$ ” applies to the whole line that follows it. Also, $⊢$ does not stand for an expression in PA; it’s a symbol we use to talk about what PA can prove.
- “ $□ (1+1=2)$ ” basically means the same thing. More precisely, it stands for a numerical expression within PA that can be translated as saying ” $⊢$ 1+1=2″. This translation is possible because of something called a Gödel numbering which allows PA to talk about a (numerically encoded version of) itself.
- ″ $□ X$ ” is short for “ $□ (X)$ ” in cases where “ $X$ ” is just a single character of text.
- “ $X ⊢ Y$ ” means “PA, along with X as an additional axiom/assumption, can prove Y”. In this post we don’t have any analogous notation for $□$ .

Andrew_Critch Dec 30, 2022, 7:35 AM
LW: 4 AF: 2
0
AF
in reply to: tristanhaze’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
Well, the deduction theorem is a fact about PA (and, propositional logic), so it’s okay to use as long as $⊢$ means “PA can prove”.

But you’re right that it doesn’t mix seamlessly with the (outer) necessitation rule. Necessitation is a property of “ $⊢$ ”, but not generally a property of “ $X ⊢$ ”. When PA can prove something, it can prove that it can prove it. By contrast, if PA+X can prove Y, that does mean that PA can prove that PA+X can prove Y (because PA alone can work through proofs in a Gödel encoding), but it doesn’t mean that PA+X can prove that PA can prove Y. This can be seen by example, by setting $X = Y = \neg □ (1 = 0)$ ”.
As for the case where you want $⊢$ to refer to K or S5 instead of PA provability, those logics are still built on propositional logic, for which the deduction theorem does hold. So if you do the deduction only using propositional logic from theorems in $⊢$ along with an additional assumption X, then the deduction theorem applies. In particular, inner necessitation and box distributivity are both theorems of $⊢$ for every $A$ and $B$ you stick into them (rather than meta theorems about $⊢$ , which is what necessitation is). So the application of the deduction theorem here is still valid.
Still, the deduction theorem isn’t safe to just use willy nilly along with the (outer) necessitation rule, so I’ve just added a caveat about that:
Note that from $X ⊢ A$ we cannot conclude $X ⊢ □ A$ , because $□$ still means “PA can prove”, and not “PA+X can prove”.
Thanks for calling this out.

Andrew_Critch Dec 30, 2022, 7:11 AM
LW: 3 AF: 1
0
AF
in reply to: Gurkenglas’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
Well, $A \to B$ is just short for $\neg A \lor B$ , i.e., “(not A) or B”. By contrast, $A ⊢ B$ means that there exists a sequence of (very mechanical) applications of modus ponens, starting from the axioms of Peano Arithmetic (PA) with $A$ appended, ending in $B$ . We tried hard to make the rules of $⊢$ so that it would agree with $\to$ in a lot of cases (i.e., we tried to design $⊢$ to make the deduction theorem true), but it took a lot of work in the design of Peano Arithmetic and can’t be taken for granted.
For instance, consider the statement $\neg □ (1 = 0)$ . If you believe Peano Arithmetic is consistent, then you believe that $\neg □ (1 = 0)$ , and therefore you also believe that $□ (1 = 0) \to (2 = 3)$ . But PA cannot prove that $\neg □ (1 = 0)$ (by Gödel’s Theorem, or Löb’s theorem with $p = (1 = 0)$ ), so we don’t have $⊢ □ (1 = 0) \to (2 = 3)$ .