That all depends on the approach… if you have some big human-inspired but more brainy neural network that learns to be a person, it can well just do the right thing by itself, and the risks are in any case quite comparable to that with having a human do it.
If you are thinking of a “neat AI” with utility functions over world models and such, parts of said AI can maximize abstract metrics over mathematical models (including self improvement) without any “generally intelligent” process of eating you. So you would want to use those to build models of human meaning and intent.
Furthermore with regards to AI following some goals, it seems to me that goal specifications would have to be intelligently processed in the first place so that they could be actually applied to the real world—we can’t even define paperclips otherwise.
That all depends on the approach… if you have some big human-inspired but more brainy neural network that learns to be a person, it can well just do the right thing by itself, and the risks are in any case quite comparable to that with having a human do it.
If you are thinking of a “neat AI” with utility functions over world models and such, parts of said AI can maximize abstract metrics over mathematical models (including self improvement) without any “generally intelligent” process of eating you. So you would want to use those to build models of human meaning and intent.
Furthermore with regards to AI following some goals, it seems to me that goal specifications would have to be intelligently processed in the first place so that they could be actually applied to the real world—we can’t even define paperclips otherwise.