Noosphere89 comments on My disagreements with “AGI ruin: A List of Lethalities”

Noosphere89 Oct 4, 2024, 4:31 PM
4 points
0
Hm, inclusive genetic fitness is a very non-local criterion, at least as is often assumed on LW, because a lot of the standard alignment failures that people talk about like birth control and sex, took about 10,000-300,000 years to happen, and in general with the exception of bacteria or extreme selection pressure, thousands of years time-scales for mammals and other animals is the norm to generate sufficient selection pressure to develop noticeable traits, so compared to evolution, there’s a 4-5+ OOMs difference in calendar time.

IGF is often assumed to be the inclusive genetic fitness of all genes for all time, otherwise the problems that are usually trotted out become far less evidence for alignment problems arising when we try to align AIs to human values.

But there’s a second problem that exists independently of the first problem, and that’s the other differences in how we can control AIs versus how evolution controlled humans here:

https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#sYA9PLztwiTWY939B

The important parts are these:

You can say that evolution had an “intent” behind the hardcoded circuitry, and humans in the current environment don’t fulfill this intent. But I don’t think evolution’s “intent” matters here. We’re not evolution. We can actually choose an AI’s training data, and we can directly choose what rewards to associate with each of the AI’s actions on that data. Evolution cannot do either of those things.

Evolution does this very weird and limited “bi-level” optimization process, where it searches over simple data labeling functions (your hardcoded reward circuitry), then runs humans as an online RL process on whatever data they encounter in their lifetimes, with no further intervention from evolution whatsoever (no supervision, re-labeling of misallocated rewards, gathering more or different training data to address observed issues in the human’s behavior, etc). Evolution then marginally updates the data labeling functions for the next generation. It’s is a fundamentally different type of thing than an individual deep learning training run.