This framing where a goal concept is prominent is not obviously superior to other designs that don’t pursue goals, and instead focus on pointing at the appropriate influences from the world. For example, a system may seek to make reliable uploads, or figure out which decisions of uploads are errors, or organize uploads to make sense of situations outside normal human environments, or be corrigible in a secure way, so as to follow directions of a sane external operator and not of an attacker.
This makes me think I probably misunderstood what you meant earlier by “agents that are not primarily goal-directed”. Do you have a reference that you can point me to that describes what you have in mind in more detail?
This makes me think I probably misunderstood what you meant earlier by “agents that are not primarily goal-directed”. Do you have a reference that you can point me to that describes what you have in mind in more detail?