We don’t know how to distinguish systems with long- and short-term goals. Even in principle, we don’t know how to say if AIXI-like program, running on hypercomputer, will optimize for long- or short-term goal. I.e., to your proposition “if we build AI with short-term goal, we are safe” correct response is “What exactly do you mean by short-term goal?”
Even what we intuitively understand to be “short-term goal” can be pretty scary. If something can bootstrap nanotech in week, planning horizon in 10 days doesn’t save us.
As to the definition of short term goal: any goal that is can be achieved (fully, e.g. without a “and keep it that way” clause) in a finite short time (for instance, in a few seconds), with the resources the system already has at hand. Equivalently, I think: any goal that doesn’t push instrumental power seeking.
As to how we know a system has a short term goal: if we could argue that systems prefer short term goals by default, then we still wouldn’t know as to the goals of a particular system but we could hazard a guess that the goals are short term. Perhaps we could expect short term goals by default if they were, for instance, easier to specify, and thus to have. As pointed out by others, if we try to give systems long term goals on purpose, they will probably end up with long term goals.
We don’t know how to distinguish systems with long- and short-term goals. Even in principle, we don’t know how to say if AIXI-like program, running on hypercomputer, will optimize for long- or short-term goal. I.e., to your proposition “if we build AI with short-term goal, we are safe” correct response is “What exactly do you mean by short-term goal?”
Even what we intuitively understand to be “short-term goal” can be pretty scary. If something can bootstrap nanotech in week, planning horizon in 10 days doesn’t save us.
As to the definition of short term goal: any goal that is can be achieved (fully, e.g. without a “and keep it that way” clause) in a finite short time (for instance, in a few seconds), with the resources the system already has at hand. Equivalently, I think: any goal that doesn’t push instrumental power seeking. As to how we know a system has a short term goal: if we could argue that systems prefer short term goals by default, then we still wouldn’t know as to the goals of a particular system but we could hazard a guess that the goals are short term. Perhaps we could expect short term goals by default if they were, for instance, easier to specify, and thus to have. As pointed out by others, if we try to give systems long term goals on purpose, they will probably end up with long term goals.