If evolution is to humans as humans are to UFAI, I suppose UFAI corresponds to too little compute allocated to understanding our goal specification, and too much compute allocated to maximizing it. That suggests the solution is relatively simple.
It seems like the problem from evolution’s perspective isn’t that we don’t understand our goal specification but that our goals are different from evolution’s goals. It seems fairly tautological that putting more compute towards maximizing a goal specification than towards making sure the goal specification is what we want is likely to lead to UFAI; I don’t see how that implies a “relatively simple” solution?
It seems fairly tautological that putting more compute towards maximizing a goal specification than towards making sure the goal specification is what we want is likely to lead to UFAI
And the “relatively simple” solution is to do the reverse, and put more compute towards making sure the goal specification is what we want than towards maximizing it.
(It’s possible this point isn’t very related to what Wei Dai said.)
Isn’t this just saying it would be nice if we collectively put more resources towards alignment research relative to capabilities research? I still feel like I’m missing something :/
We may be able to offload some work to the system, e.g. by having it search for a diverse range of models for the user’s intent, instead of making it use a single hardcoded goal specification.
This comment of mine is a bit related if you want more elaboration:
If evolution is to humans as humans are to UFAI, I suppose UFAI corresponds to too little compute allocated to understanding our goal specification, and too much compute allocated to maximizing it. That suggests the solution is relatively simple.
(sorry for commenting on such an old post)
It seems like the problem from evolution’s perspective isn’t that we don’t understand our goal specification but that our goals are different from evolution’s goals. It seems fairly tautological that putting more compute towards maximizing a goal specification than towards making sure the goal specification is what we want is likely to lead to UFAI; I don’t see how that implies a “relatively simple” solution?
And the “relatively simple” solution is to do the reverse, and put more compute towards making sure the goal specification is what we want than towards maximizing it.
(It’s possible this point isn’t very related to what Wei Dai said.)
Isn’t this just saying it would be nice if we collectively put more resources towards alignment research relative to capabilities research? I still feel like I’m missing something :/
We may be able to offload some work to the system, e.g. by having it search for a diverse range of models for the user’s intent, instead of making it use a single hardcoded goal specification.
This comment of mine is a bit related if you want more elaboration:
https://www.lesswrong.com/posts/NtX7LKhCXMW2vjWx6/thoughts-on-reward-engineering#jJ7nng3AGmtAWfxsy
If you have thoughts on it, probably best to reply there—we are already necroposting, so let’s keep the discussion organized.