Kaj_Sotala comments on Concept Safety: Producing similar AI-human concept spaces

Kaj_Sotala 21 Apr 2015 7:09 UTC
2 points
So how do you feel about the proposal I made in my latest post, to evaluate the new situation in light of the old values? (Might want to continue this thread in the comments of that post.)

My (low-confidence) intuition is that while it’s certainly possible to easily screw up the implementation, if the system is engineered correctly, then the process by which the AI applies the old values to the new situation/new concept space should be essentially same as the one by which humans would do it. Of course, in practice “the system being engineered correctly” might require e.g. a very human-like design including a humanoid body etc. in order to get the initial concept space to become sufficiently similar to the human one, so that’s a problem.

I think I’m also somewhat more optimistic about the range of solutions that might qualify as “good”, because a large part of human values seem to be determined by reinforcement learning. (Compare Hanson on plasticity.) I suspect that if e.g. nanotech and memehacking became available, then the “best” approach to deal with it is underdetermined by our current values, and just because an AI would extrapolate our current values differently than humans would, doesn’t necessarily mean that that extrapolation would be any worse. I mean, if the best extrapolation is genuinely underdetermined by our current values, then that means that a wide range of possibilities is equally good pretty much by definition.