It might start a session of self-modification by looking for the secret of joy and end (like some Greek sages) deciding that tranquillity is superior to joy. This modification of desire en route to realizing it is easily classified as learning, and deserves our respect. But imagine the case of a machine hoping to make itself less narcissistic and more considerate of the interests of others, but ending by desiring to advance its own ends at the expense of others, even through violence.
It might start a session of self-modification by looking for the secret of something we like and (like a high status group of people) deciding that applause light is superior to something we like. This modification of desire en route to realizing it is easily classified as learning, and deserves our respect. But imagine the case of a machine hoping to make itself less unlikeable and more likeable, but that ends up pursuing unlikeable goals, even through the use of boo lights.
Machines that self-modify can fail at goal preservation, which is a failure if you want to optimize for said goals. No need to import human value judgements, this only confuses the argument for the reader.
No need to import human value judgements, this only confuses the argument for the reader.
On the one hand, I’d agree with you… but consider this excellent example of our “objective/unemotional” perceptions failing to communicate to us how game theory feels from the inside!
If told about how a machine that wanted to maximize A and minimize B ended up self-modifying to maximize a B-correlated C, most humans would not feel strongly about that, they’d hardly pay attention—but they’d wish they had if later told that, say, A was “hedonism”, B was “suffering” and C was “murder”. Such insensitivity plagues nearly everyone, even enlightened LW readers.
Generating drama so as to stir the unwashed masses sounds… suboptimal… and I say this as an avid drama-generator. Surely there are better ways to combat the plague of complacency?
It might start a session of self-modification by looking for the secret of something we like and (like a high status group of people) deciding that applause light is superior to something we like. This modification of desire en route to realizing it is easily classified as learning, and deserves our respect. But imagine the case of a machine hoping to make itself less unlikeable and more likeable, but that ends up pursuing unlikeable goals, even through the use of boo lights.
Machines that self-modify can fail at goal preservation, which is a failure if you want to optimize for said goals. No need to import human value judgements, this only confuses the argument for the reader.
On the one hand, I’d agree with you… but consider this excellent example of our “objective/unemotional” perceptions failing to communicate to us how game theory feels from the inside!
If told about how a machine that wanted to maximize A and minimize B ended up self-modifying to maximize a B-correlated C, most humans would not feel strongly about that, they’d hardly pay attention—but they’d wish they had if later told that, say, A was “hedonism”, B was “suffering” and C was “murder”. Such insensitivity plagues nearly everyone, even enlightened LW readers.
Generating drama so as to stir the unwashed masses sounds… suboptimal… and I say this as an avid drama-generator. Surely there are better ways to combat the plague of complacency?