I’ve had a background assumption in my interpretation of and beliefs about reward functions for as long as I can remember (i.e. since first reading the sequences), that I suddenly realized I don’t believe is written down. Over the last two years I’ve gained experience writing coq sufficient to inspire a convenient way of framing it.
Computational vs axiomatic reward functions
Computational vs axiomatic in proof engineering
A proof engineer calls a proposition computational if it’s proof can be broken down into parts.
For example, a + (b + c) = (a + b) + c is computational because you can think of it’s proof as the application of the associativity lemma then the application of something called a “refl”, the fundamental termination of a proof involving equality. Passing around the associativity lemma is in a sense passing around it’s proof, which assuming a is inductive (take nat; zero and successor) is an application of nat’s induction principle, unpacking the recursive definition of +, etc.
In other words, if my adversary asks “why is a + (b + c) = (a + b) + c I can show them; I only have to make sure they agree to the fundamental definitions of nat and + : nat -> nat -> nat, the rest I can compel them to believe.
On the flip side, consider function extensionality, or f = g <-> forall x, f x = g x, not provable because we do not know that the domain of f (which equals the domain of g) is countable, to name but one scenario. Because they can’t prove it, theories “admit function extensionality as an axiom” from time to time.
In other words, if I invoke function extensionality in a proof, and my adversary has agreed to the basic type and function definitions, they remain entitled to reject my proof because if they ask why I believe function extensionality the best I can do is say “because I declared it on line 7”.
We do not call reasoning involving axioms computational. Instead, the discourse has sort of become poisoned by the axiom; it’s verificational properties have become weaker. (Intuitively, I can declare on line 7 anything I want; the risk of proving something that is actually false increases a great deal with each axiom I declare).
Apocryphally, a lecturer recalled a meeting perhaps of the univalent foundations group at IAS, when homotopy type theory (HoTT) was brand new (HoTT is based on something called univalence, which is about reasoning on type equalities in arbitrary “universes” (“kinds” for the haskell programmer)). In HoTT 1.0, univalence relied on an axiom (done carefully of course, to minimize the damage of the poison) and Per Martin-Lof is said to have remarked “it’s not really type theory if there’s an axiom”. HoTT 2.0 called cubical type theory repairs this, which is why cubical tt is sometimes called computational tt.
AIXI-like and AIXI-unlike AGIs
If the space of AGIs can be carved into AIXI-like and AIXI-unlike with respect to goals, clearly AIXI-like architectures have goals imposed on them axiomatically by the programmer. The complement of course is where the reward function is computational; decomposable.
See the NARS literature of Wang et. al. for something at least adjacent to AIXI-unlike—reasoning about NARS emphasizes that reward functions can be computational to an extent, but “bottom out” at atoms eventually. Still, NARS goals are computational to a far greater degree than AIXI-likes.
Conjecture: humans are AIXI-unlike AGIs
This should be trivial: humans can decompose their reward functions in ways richer than “because god said so”.
Relation to mutability???
If the space of AGIs can be carved into AIXI-like and human-like with respect to goals, does the computationality question help me reason about modifying my own reward function? Intuitively, AIXI’s axiomatic goal corresponds to immutability. However, I don’t think there’s a for-free implication that AIXI-unlikes get self-modification for-free. More work needed.
Position of this post in my overall reasoning
In general, my basic understanding that the AGI space can be divided into what I’ve called AIXI-like and AIXI-unlike with respect to how reward functions are reasoned about, and that computationality (anaxiomaticity vs axiomaticity?) is the crucial axis to view, is deeply embedded in my assumptions. Maybe writing it down will make eventually changing my mind about this easier: I’m uncertain just how conventional my belief/understanding is here.
I should be more careful not to imply I think that we have solid specimens of computational reward functions; more that I think it’s a theoretically important region of the space of possible minds, and might factor in idealizations of agency
I’ve had a background assumption in my interpretation of and beliefs about reward functions for as long as I can remember (i.e. since first reading the sequences), that I suddenly realized I don’t believe is written down. Over the last two years I’ve gained experience writing coq sufficient to inspire a convenient way of framing it.
Computational vs axiomatic reward functions
Computational vs axiomatic in proof engineering
A proof engineer calls a proposition computational if it’s proof can be broken down into parts.
For example,
a + (b + c) = (a + b) + c
is computational because you can think of it’s proof as the application of the associativity lemma then the application of something called a “refl”, the fundamental termination of a proof involving equality. Passing around the associativity lemma is in a sense passing around it’s proof, which assuminga
is inductive (takenat
; zero and successor) is an application ofnat
’s induction principle, unpacking the recursive definition of+
, etc.In other words, if my adversary asks “why is
a + (b + c) = (a + b) + c
I can show them; I only have to make sure they agree to the fundamental definitions ofnat
and+ : nat -> nat -> nat
, the rest I can compel them to believe.On the flip side, consider function extensionality, or
f = g <-> forall x, f x = g x
, not provable because we do not know that the domain off
(which equals the domain ofg
) is countable, to name but one scenario. Because they can’t prove it, theories “admit function extensionality as an axiom” from time to time.In other words, if I invoke function extensionality in a proof, and my adversary has agreed to the basic type and function definitions, they remain entitled to reject my proof because if they ask why I believe function extensionality the best I can do is say “because I declared it on line 7”.
We do not call reasoning involving axioms computational. Instead, the discourse has sort of become poisoned by the axiom; it’s verificational properties have become weaker. (Intuitively, I can declare on line 7 anything I want; the risk of proving something that is actually false increases a great deal with each axiom I declare).
Apocryphally, a lecturer recalled a meeting perhaps of the univalent foundations group at IAS, when homotopy type theory (HoTT) was brand new (HoTT is based on something called univalence, which is about reasoning on type equalities in arbitrary “universes” (“kinds” for the haskell programmer)). In HoTT 1.0, univalence relied on an axiom (done carefully of course, to minimize the damage of the poison) and Per Martin-Lof is said to have remarked “it’s not really type theory if there’s an axiom”. HoTT 2.0 called cubical type theory repairs this, which is why cubical tt is sometimes called computational tt.
AIXI-like and AIXI-unlike AGIs
If the space of AGIs can be carved into AIXI-like and AIXI-unlike with respect to goals, clearly AIXI-like architectures have goals imposed on them axiomatically by the programmer. The complement of course is where the reward function is computational; decomposable.
See the NARS literature of Wang et. al. for something at least adjacent to AIXI-unlike—reasoning about NARS emphasizes that reward functions can be computational to an extent, but “bottom out” at atoms eventually. Still, NARS goals are computational to a far greater degree than AIXI-likes.
Conjecture: humans are AIXI-unlike AGIs
This should be trivial: humans can decompose their reward functions in ways richer than “because god said so”.
Relation to mutability???
If the space of AGIs can be carved into AIXI-like and human-like with respect to goals, does the computationality question help me reason about modifying my own reward function? Intuitively, AIXI’s axiomatic goal corresponds to immutability. However, I don’t think there’s a for-free implication that AIXI-unlikes get self-modification for-free. More work needed.
Position of this post in my overall reasoning
In general, my basic understanding that the AGI space can be divided into what I’ve called AIXI-like and AIXI-unlike with respect to how reward functions are reasoned about, and that computationality (anaxiomaticity vs axiomaticity?) is the crucial axis to view, is deeply embedded in my assumptions. Maybe writing it down will make eventually changing my mind about this easier: I’m uncertain just how conventional my belief/understanding is here.
I should be more careful not to imply I think that we have solid specimens of computational reward functions; more that I think it’s a theoretically important region of the space of possible minds, and might factor in idealizations of agency