Jacob Watts comments on All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Jacob Watts 29 Jan 2023 2:23 UTC
1 point
0
My off-the-cuff best guesses at answering these questions:

1. Current day large language models do have “goals”. They are just very alien, simple-ish goals that are hard to conceptualize. GPT-3 can be thought of as having a “goal” that is hard to express in human terms, but which drives it to predict the next word in a sentence. It’s neural pathways “fire” according to some form of logic that leads it to “try” to do certain things; this is a goal. As systems become more general, their goals they will continue to have goals. Their terminal goals can remain as abstract and incomprehensible as whatever GPT-3′s goal could be said to be, but they will be more capable of devising instrumental goals that are comprehensible in human terms.

2. Yes. Anything that intelligently performs tasks can be thought of as having goals. That is just a part of why input x outputs y and not z. The term “goal” is just a way of abstracting the behavior of complex, intelligent systems to make some kind of statement about what inputs correspond to what outputs. As such, it is not coherent to speak about an intelligent system that does not have “goals” (in the broad sense of the word). If you were to make a circuit board that just executes the function x = 3y, that circuit board could be said to have “goals” if you chose to consider it intelligent and use the kind of language that we usually reserve for people to describe it. These might not be goals that are familiar or easily expressible in human terms, but they are still goals in a relevant sense. If we strip the word “goal” down to pretty much just mean “the thing a system inherently tends towards doing”, then systems that do things can necessarily be said to have goals.

3. “Tool” and “agent” is not a meaningful distinction past a certain point. A tool with any level of “intelligence” that carries out tasks would necessarily be an agent in a certain sense. Even a thermostat can be correctly thought of as an agent which optimizes for a goal. While some hypothetical systems might be very blatant about their own preferences and other systems might behave more like how we are used to tools behaving, they can both be said to have “goals” they are acting on. It is harder to conceptualize the vague inner goals of systems that seem more like tools and easier to imagine the explicit goals of a system that behaves more like a strategic actor, but this distinction is only superficial. In fact, the “deeper”/more terminal goals of the strategic actor system would be incomprehensible and alien in much the same way as the tool system. Human minds can be said to optimize for goals that are, in themselves, not similar humans’ explicit terminal goals/values. Tool-like AI is just agentic-AI that is either incapable of or (as in the case of deception) not currently choosing to carry out goals in a way that is obviously agentic by human standards.