a rock wants to stay a rock, and how badly it wants it is defined by the equations of the nuclear strong forces. a teacup wants to stay a teacup, same deal as the rock. a yogi appears to want to stay a yogi, though I’m less confident that a yogi wants that quite as badly or consistently as a rock wants to stay a rock; thankfully, most yogi produce significant communication before becoming silent, and there are incremental tests you can do that will reveal that attempting to interfere with their actions will result in them attempting to maintain their state as a yogi.
the teacup test is reasonable, but a good ai safety solution would be teacup invariant.
and anyway, I have an intuition that damaging anything is necessarily going to be against the values of some agent, so even if nobody seems to care, it’s worth interrogating a map of other agents’ preferences about stuff. maybe they aren’t allowed to override yours, depending on the preference permissions, but it still seems like if you want to smash your cup and it’s the most beautiful cup that ever existed, maybe I want to offer to trade your beautiful cup that many agents would seek to intervene to protect with something less special. perhaps, for example, a virtual simulation of the behavior a cup would have when breaking, if knowledge of the process is what truly appetizes; maybe you’d even enjoy simulations of much more complicated stuff breaking? or perhaps you might like a different cup. after all, those atoms want to stay together right now using their local agency of being physical chemicals, and my complicated agency-throwing machine called a brain detects there may be many agency-throwers in the world who want to protect the teacup.
a rock wants to stay a rock, and how badly it wants it is defined by the equations of the nuclear strong forces. a teacup wants to stay a teacup, same deal as the rock. a yogi appears to want to stay a yogi, though I’m less confident that a yogi wants that quite as badly or consistently as a rock wants to stay a rock; thankfully, most yogi produce significant communication before becoming silent, and there are incremental tests you can do that will reveal that attempting to interfere with their actions will result in them attempting to maintain their state as a yogi.
the teacup test is reasonable, but a good ai safety solution would be teacup invariant.
and anyway, I have an intuition that damaging anything is necessarily going to be against the values of some agent, so even if nobody seems to care, it’s worth interrogating a map of other agents’ preferences about stuff. maybe they aren’t allowed to override yours, depending on the preference permissions, but it still seems like if you want to smash your cup and it’s the most beautiful cup that ever existed, maybe I want to offer to trade your beautiful cup that many agents would seek to intervene to protect with something less special. perhaps, for example, a virtual simulation of the behavior a cup would have when breaking, if knowledge of the process is what truly appetizes; maybe you’d even enjoy simulations of much more complicated stuff breaking? or perhaps you might like a different cup. after all, those atoms want to stay together right now using their local agency of being physical chemicals, and my complicated agency-throwing machine called a brain detects there may be many agency-throwers in the world who want to protect the teacup.