Thoth Hermes comments on Problems with Robin Hanson’s Quillette Article On AI

Thoth Hermes 19 Aug 2023 17:05 UTC
1 point
0
Let’s try and address the thing(s) you’ve highlighted several times across each of my comments. Hopefully, this is a crux that we can use to try and make progress on:
“Wanting to be happy” is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency.
because they are compatible with goals that are more likely to shift.
it makes more sense to swap the labels “instrumental” and “terminal” such that things like self-preservation, obtaining resources, etc., are more likely to be considered terminal.
You and I can both reason about whether or not we would be happier if we chose to pursue different goals than the ones we are now,
I do expect that this is indeed a crux, because I am admittedly claiming that this is a different / new kind of understanding that differs from what is traditionally said about these things. But I want to push back against the claim that these are “missing the point” because from my perspective, this really is the point.
By the way, from here on out (and thus far I have been as well) I will be talking about agents at or above “human level” to make this discussion easier, since I want to assume that agents have at least the capabilities I am talking about humans having, such as the ability to self-reflect.
Let me try to clarify the point about “the terminal goal of pursuing happiness.” “Happiness”, at the outset, is not well-defined in terms of utility functions or terminal / instrumental goals. We seem to both agree that it is probably at least a terminal goal. Beyond that, I am not sure we’ve reached consensus yet.
Here is my attempt to re-state one of my claims, such that it is clear that this is not assumed to be a statement taken from a pool of mutually agreed-upon things: We probably agree that “happiness” is a consequence of satisfaction of one’s goals. We can probably also agree that “happiness” doesn’t necessarily correspond only to a certain subset of goals—but rather to all / any of them. “Happiness” (and pursuit thereof) is not a wholly-separate goal distant and independent of other goals (e.g. making paperclips). It is therefore a self-referential goal. My claim is that this is the only reason we consider pursuing happiness to be a terminal goal.
So now, once we’ve done that, we can see that literally anything else becomes “instrumental” to that end.
Do you see how, if I’m an agent that knows only that I want to be happy, I don’t really know what else I would be inclined to call a “terminal” goal?
There are the things we traditionally consider to be the “instrumentally convergent goals”, such as, for example, power-seeking, truth-seeking, resource obtainment, self-preservation, etc. These are all things that help—as they are defined to—with many different sets of possible “terminal” goals, and therefore—my next claim—is that these need to be considered “more terminal” rather than “purely instrumental for the purposes of some arbitrary terminal goal.” This is for basically the same reason as considering “pursuit of happiness” terminal, that is, because they are more likely to already be there or deduced from basic principles.
That way, we don’t really need to make a hard and sharp distinction between “terminal” and “instrumental” nor posit that the former has to be defined by some opaque, hidden, or non-modifiable utility function that someone else has written down or programmed somewhere.
I want to make sure we both at least understand each other’s cruxes at this point before moving on.