There seems to be several orders of magnitude of difference between the two solutions for coloring a ball. You should have better predictions than that for what it can do. Obviously you shouldn’t run anything remotely capable of engineering bacteria without a much better theory about what it will do.
I suspect “avoiding changing the world” actually has some human-values baked into it.
This seems to be trying to box an AI with it’s own goal system, which I think puts it in the tricky-wish category.
With 2 differences: CEV is tries to correct any mistakes in the initial formulation of the wish(aiming for an attractor), and it doesn’t force the designers to specify details like whether making bacteria is ok or not-ok.
It’s the difference between painting a painting of a specific scene, and making an auto-focus camera.
I do currently think it is possible to create a powerful cross-domain optimizer that is not a person and will not create persons or unbox itself or look at our universe or tile the universe with anything or make AI that doesn’t comply with this. But I approach this line of thought with extreme caution, and really only to accelerate whatever it takes to get to CEV, because AI can’t safely make changes to the real world without some knowledge of human volition, even if it wants to.
What if I missed something that’s on the scale of the nonperson predicate? My AI works, creatively paints the apple, but somehow it’s solution is morally awful. Even staying within pure math could be bad for unforseen reasons.
There seems to be several orders of magnitude of difference between the two solutions for coloring a ball. You should have better predictions than that for what it can do. Obviously you shouldn’t run anything remotely capable of engineering bacteria without a much better theory about what it will do.
I suspect “avoiding changing the world” actually has some human-values baked into it.
This seems to be trying to box an AI with it’s own goal system, which I think puts it in the tricky-wish category.
See my reply to Vladimir Nesov.
Do you count CEV to be in the same category?
With 2 differences: CEV is tries to correct any mistakes in the initial formulation of the wish(aiming for an attractor), and it doesn’t force the designers to specify details like whether making bacteria is ok or not-ok.
It’s the difference between painting a painting of a specific scene, and making an auto-focus camera.
I do currently think it is possible to create a powerful cross-domain optimizer that is not a person and will not create persons or unbox itself or look at our universe or tile the universe with anything or make AI that doesn’t comply with this. But I approach this line of thought with extreme caution, and really only to accelerate whatever it takes to get to CEV, because AI can’t safely make changes to the real world without some knowledge of human volition, even if it wants to.
What if I missed something that’s on the scale of the nonperson predicate? My AI works, creatively paints the apple, but somehow it’s solution is morally awful. Even staying within pure math could be bad for unforseen reasons.