Victor Novikov comments on The Teacup Test

Victor Novikov 10 Oct 2022 11:54 UTC
4 points
0
I’m going to bite the bullet and say that an “intelligence” and “optimizer” are fundamentally the same thing; or rather that these words points to the same underlying concept we don’t quite have a non-misleading word for.
An optimizer is a system that pulls probability-mass away from some world-states and toward some world-states; anything that affects reality is an “optimizer”. A tea cup is an optimizer.
The purpose of an optimizer is what it does; what is optimizing for. A teacup’s purpose is to contain objects inside, when the teacup is positioned the right way up in in a gravity field. A teacup’s purpose is to transfer heat. A teacup’s purpose is to break when dropped from a height. A teacup’s purpose is the set of all it does. A teacup’s purpose, in full generality is: “be a teacup”.
A teacup is aligned to the purpose of being a teacup.
A system that is a aligned is one that is the correct state to fullfill its own purpose.
And a teacup is, obviously, in the correct physical state for being a teacup. Tautology.
All systems are perfectly aligned to their own purpose.
But a teacup is imprefectly aligned for the purpose of “being used as a teacup by a human”. If it a dropped, it may break. If it is tipped, it may spill. All of these things are aligned to the purpose of “be the physical object: the teacup” but imprefectly “be a useful teacup for drinking tea and other liquids”
What is an optimizer?
An optimizer is an engine that converts alignment into purpose.
Alignment: “be a teacup” → purpose: “behave like a teacup”. This part is tautological.
Alignment: “be a useful teacup for humans” → purpose: “be used in beneficial ways by humans”. This part is not tautological.
A teacup may be good or bad at that. A teacup may harm humans, though: it may spill tea. It may break into sharp shards of ceramic. So a teacup may cause both good and bad outcomes.
A Friendly teacup, a human-aligned teacup is one that is optimizes for its purpose, of making good outcomes more likely and bad outcomes less likely.
A Friendly teacup is harder to spill or to accidentally drop. A Friendly teacup is not so heavy that it would injure a human if it falls on their foot. A Friendly teacup is one that is less likely to harm a human if it breaks.
But how does a teacup optimize for good outcome? By being a teacup. By continuing to be a teacup.
Once a physical object has been aligned into the state of being a teacup, it continues to be a teacup. Because a teacup is a physical system that optimizes for retaining its shape (unless is broken or damaged).
A Friendly teacup, once aligned into being a Friendly teacup, serves its purpose by continuing to be a Friendly teacup. A Friendly teacup optimizes humans, it optimizes some particular set of outcomes for humans by continuing to be a Friendly teacup.
How does a Friendly teacup optimize you? Because its existence, its state of being and continuing to be a teacup, leads to you to make different choices than if it that were not the case; you might enjoy a refreshing cup of tea!
You are being optimized. By a teacup. So that it may fullfill its assigned purpose. This is a perfectly valid way to see things.
The teacup has been made (aligned) in a way that makes it a good teacup (purpose): in this example the optimization-pressure is the process that created the teacup.
So this is my answer: your example is valid. Both teacup alignment and AI alignment are fields that use some of the same underlying concepts, if you understand these terms to full generality.
But for teacups these things are obvious, so we don’t need fancy terminology for them, it is confusing to try and use the terminology this way.
But it is valid.