In contrast, in a slow takeoff world, many aspects of the AI alignment problems will already have showed up as alignment problems in non-AGI, non-x-risk-causing systems; in that world, there will be lots of industrial work on various aspects of the alignment problem, and so EAs now should think of themselves as trying to look ahead and figure out which margins of the alignment problem aren’t going to be taken care of by default, and try to figure out how to help out there.
Lets consider the opposite. Imagine you are programming a self driving car, in a simulated environment. You notice it goodhearting your metrics, so you tweak them and try again. You build up a list of 1001 ad hoc patches that makes your self driving car behave reasonably most of the time.
The object level patches only really apply to self driving cars. They include things like a small intrinsic preference towards looking at street signs. The meta level strategy of patching it until it works isn’t very relevant either.
Imagine a world with many AI’s like this. All with ad hoc kludges of hard coded utility functions. The AI is becoming increasingly economically important and getting close to AGI. Slow takeoff. All the industrial work is useless.
Lets consider the opposite. Imagine you are programming a self driving car, in a simulated environment. You notice it goodhearting your metrics, so you tweak them and try again. You build up a list of 1001 ad hoc patches that makes your self driving car behave reasonably most of the time.
The object level patches only really apply to self driving cars. They include things like a small intrinsic preference towards looking at street signs. The meta level strategy of patching it until it works isn’t very relevant either.
Imagine a world with many AI’s like this. All with ad hoc kludges of hard coded utility functions. The AI is becoming increasingly economically important and getting close to AGI. Slow takeoff. All the industrial work is useless.