In some sense, the Agent Foundations program at MIRI sees the problem as: human values are currently an informal object. We can only get meaningful guarantees for formal systems. So, we need to work on formalizing concepts like human values. Only then will we be able to get formal safety guarantees.
unless i’m misunderstanding you or MIRI, that’s not their primary concern at all:
Another way of putting this view is that nearly all of the effort should be going into solving the technical problem, “How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?”
Where obviously it’s important that the system not do anything severely unethical in the process of building its strawberries; but if your strawberry-building system requires its developers to have a full understanding of meta-ethics or value aggregation in order to be safe and effective, then you’ve made some kind of catastrophic design mistake and should start over with a different approach.
unless i’m misunderstanding you or MIRI, that’s not their primary concern at all: