one reason we might have to think that the AI would be careful about this, is that it knows i has a utility function to maximize but it doesn’t know what yet, but it can make informed guesses about it. “i don’t know what my human user is gonna pick as utility function, but whatever it is, it probly strongly dislikes me causing damage, so i should probly avoid that”.
it’s not careful because we have the alignment tech to give it the characteristic of carefulness, it’s hopefully careful because it’s ultimately aligned, and its best guess as to what it’s aligned to entails not destroying everything that matters.
This doesn’t make me any less suspicious. Humans have a utility function of “make more humans”, but we still invented nuclear weapons and came within a hair’s breadth of destroying the entire planet.
hello I’m “point out evolution alignment or not” brain shard. humans do not have a utility function of make more humans, they have a utility function of preserve their genetic-epigenetic-memetic self-actualization trajectory, or said less obtusly, make your family survive indefinitely. that does not mean make your family as big as possible. Even if you need to make your family big to make your family survive indefinitely, maximizing family size is a strategy chosen by almost no organisms or microbes. first order optimization is not how anything works except sometimes locally. second order or above always ends up happening because high quality optimization tries to hit targets (second order approximation of a policy update), it doesn’t try to go in directions.
one reason we might have to think that the AI would be careful about this, is that it knows i has a utility function to maximize but it doesn’t know what yet, but it can make informed guesses about it. “i don’t know what my human user is gonna pick as utility function, but whatever it is, it probly strongly dislikes me causing damage, so i should probly avoid that”.
it’s not careful because we have the alignment tech to give it the characteristic of carefulness, it’s hopefully careful because it’s ultimately aligned, and its best guess as to what it’s aligned to entails not destroying everything that matters.
This doesn’t make me any less suspicious. Humans have a utility function of “make more humans”, but we still invented nuclear weapons and came within a hair’s breadth of destroying the entire planet.
hello I’m “point out evolution alignment or not” brain shard. humans do not have a utility function of make more humans, they have a utility function of preserve their genetic-epigenetic-memetic self-actualization trajectory, or said less obtusly, make your family survive indefinitely. that does not mean make your family as big as possible. Even if you need to make your family big to make your family survive indefinitely, maximizing family size is a strategy chosen by almost no organisms or microbes. first order optimization is not how anything works except sometimes locally. second order or above always ends up happening because high quality optimization tries to hit targets (second order approximation of a policy update), it doesn’t try to go in directions.