I’d expect a “narrow AI” that’s capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as “being competent”, since you’d need a notion of “competence” for that, and notions of “competence” seem to refer to successful goal-achievement given some goals).
Could be, but not particularly plausible if would still naturally qualify as “AI-caused catastrophe”, rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.
(I’m interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I’m curious if I’m missing the point.)
That doesn’t sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of “goal” to explain such behavior, but it seems easier to say that competence is just competence.
You could probably stretch the definition of “goal” to explain such behavior
Characterize, not explain. This defines (idealized) goals given behavior, it doesn’t explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer’s intent (or error), but however evolution (design) happened is a distinct question from what is agent’s own goal.
Saying that something is goal-directed seems to be an average fuzzy category, like “heavy things”. Associated with it are “quantitative” ideas of a particular goal, and optimality of its achievement (like with particular weight).
This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on…
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It’s plausible we could’ve evolved something associated with time of day, for example. (It’s possible we actually do have time-dependent values associated with temporal discounting.)
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time.
I don’t believe this is the case. I need to use temporal terminal values to model the preferences that I seem to have.
If you are not talking about temporal discounting (which I mentioned), as your comment stands I can only see that there is disagreement, but don’t understand why. (Values I can think of whose expression is plausibly time-dependent seem to be better explained in terms of context.)
I’d expect a “narrow AI” that’s capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as “being competent”, since you’d need a notion of “competence” for that, and notions of “competence” seem to refer to successful goal-achievement given some goals).
Just being versed in nanotech could be enough. Or exotic physics. Or any number of other narrow domains.
Could be, but not particularly plausible if would still naturally qualify as “AI-caused catastrophe”, rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.
(I’m interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I’m curious if I’m missing the point.)
That doesn’t sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of “goal” to explain such behavior, but it seems easier to say that competence is just competence.
Characterize, not explain. This defines (idealized) goals given behavior, it doesn’t explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer’s intent (or error), but however evolution (design) happened is a distinct question from what is agent’s own goal.
Saying that something is goal-directed seems to be an average fuzzy category, like “heavy things”. Associated with it are “quantitative” ideas of a particular goal, and optimality of its achievement (like with particular weight).
This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on…
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It’s plausible we could’ve evolved something associated with time of day, for example. (It’s possible we actually do have time-dependent values associated with temporal discounting.)
I don’t believe this is the case. I need to use temporal terminal values to model the preferences that I seem to have.
If you are not talking about temporal discounting (which I mentioned), as your comment stands I can only see that there is disagreement, but don’t understand why. (Values I can think of whose expression is plausibly time-dependent seem to be better explained in terms of context.)
Yes, this is the most obvious one. I’m not sure if there are others. I would not have mentioned this if I had noticed your caveat.