General alignment properties

AIXI and the genome are both ways of specifying intelligent agents.

Give AIXI a utility function (perhaps over observation histories), and hook it up to an environment, and this pins down a policy.^[1]
Situate the genome in the embryo within our reality, and this eventually grows into a human being with a policy of their own.

These agents have different “values”, in whatever sense we care to consider. However, these two agent-specification procedures also have very different general alignment properties.

General alignment properties are not about what a particular agent cares about (e.g. the AI “values” chairs). I call an alignment property “general” if the property would be interesting to a range of real-world agents trying to solve AI alignment. Here are some examples.

Terminally valuing latent objects in reality.

AIXI only “terminally values” its observations and doesn’t terminally value latent objects in reality, while humans generally care about e.g. dogs (which are latent objects in reality).

Navigating ontological shifts.

Consider latent-diamond-AIXI (LDAIXI), an AIXI variant. LDAIXI’s utility function which scans its top 50 hypotheses (represented as Turing machines), checks each work tape for atomic representations of diamonds, and then computes the utility to be the amount of atomic diamond in the world.

If LDAIXI updates sufficiently hard towards non-atomic physical theories, then it can no longer find any utility in its top 50 hypotheses. All policies now might have equal value (zero), and LDAIXI would not continue maximizing the expected diamond content of the future. From our viewpoint, LDAIXI has failed to rebind its “goals” to its new conceptions of reality. (From LDAIXI’s “viewpoint”, it has Bayes-updated on its observations and continues to select optimal actions.)

On the other hand, physicists do not stop caring about their friends when they learn quantum mechanics. Children do not stop caring about animals when they learn that animals are made out of cells. People seem to navigate ontological shifts pretty well.

Reflective reasoning / embeddedness.

AIXI can’t think straight about how it is embedded in the world. However, people quickly learn heuristics like “If I get angry, I’ll be more likely to be mean to people around me”, or “If I take cocaine now, I’ll be even more likely to take cocaine in the future.”

Fragility of outcome value to initial conditions / Pairwise misalignment severity

This general alignment property seems important to me, and I’ll write a post on it. In short: How pairwise-unaligned are two agents produced with slightly different initial hyperparameters/architectural choices (e.g. reward function / utility function / inductive biases)?

I’m excited about people thinking more about general alignment properties and about what generates those properties.

^
Supposing e.g. uniformly random tie-breaking for actions enabling equal expected utility.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer