...But just how could we get some value into an artificial agent, so as to make it pursue that value as its final goal? While the agent is unintelligent, it might lack the capability to understand or even represent any humanly meaningful value. Yet if we delay the procedure until the agent is superintelligent, it may be able to resist our attempt to meddle with its motivation system—and, as we showed in Chapter 7, it would have convergent instrumental reasons to do so. This value-loading problem is tough, but must be confronted.
(emphasis mine)
My claim is that we are likely to see a future GPT-N system which (a) possesses the capability to understand / represent humanly meaningful value and (b) does not “resist attempts to meddle with its motivational system”. The issue is that the word “intelligence” brings in anthropomorphic connotations of having a goal etc., but AI programmers have lower-level building blocks at their disposal than the large anthropomorphic Duplos humans sometimes imagine.
My claim is that we are likely to see a future GPT-N system which [...] does not “resist attempts to meddle with its motivational system”.
Well, yes. This is primarily because GPT-like systems don’t have a “motivational system” with which to meddle. This is not a new argument by any means: the concept of AI systems that aren’t architecturally goal-oriented by default is known as “Tool AI”, and there’s plenty of pre-existing discussion on this topic. I’m not sure what you think GPT-3 adds to the discussion that hasn’t already been mentioned?
Sorry, just to make sure I’m not wasting my time here (feeling grumpy)… You said earlier that “The argument for the fragility of value never relied on AI being unable to understand human values.” I gave you a quote from Superintelligence which talked about AI being unable to understand human values. Are you gonna, like, concede the point or something? Because if you’re just throwing out arguments for the AI doom bottom line without worrying too much about whether they’re correct, I’d rather you throw them at someone else!
Anyway, I read Gwern’s article a while ago and I thought it was pretty bad. If I recall correctly, Gwern confuses various different notions, for example, he seemed to think that if you replace enough bits of handcrafted software with bits trained using machine learning, an agent will spontaneously emerge. The steelman seems to be something like “there will be competitive pressures to misuse a tool in agentlike ways”. I agree this is a risk, and I hope OpenAI keeps future versions of GPT to themselves.
I’m not sure what you think GPT-3 adds to the discussion that hasn’t already been mentioned?
It’s looking more plausible that very capable Tool AIs
Are possible
Are easier to build than Agent AIs
Will be able to solve the value-loading problem
(IIRC none of Gwern’s article addresses any of these 3 points?)
You said earlier that “The argument for the fragility of value never relied on AI being unable to understand human values.” I gave you a quote from Superintelligence which talked about AI being unable to understand human values. Are you gonna, like, concede the point or something?
The thesis that values are fragile doesn’t have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.
On gwern’s article:
Anyway, I read Gwern’s article a while ago and I thought it was pretty bad. If I recall correctly, Gwern confuses various different notions, for example, he seemed to think that if you replace enough bits of handcrafted software with bits trained using machine learning, an agent will spontaneously emerge.
I’m not sure how to respond to this, except to state that neither this specific claim nor anything particularly close to it appears in the article I linked.
On Tool AI:
Are possible
As far as I’m aware, this point has never been the subject of much dispute.
Are easier to build than Agent AIs
This is still arguable; I have my doubts, but in a “big picture” sense this is largely irrelevant to the greater point, which is:
Will be able to solve the value-loading problem
This is (and remains) the crux. I still don’t see how GPT-3 supports this claim! Just as a check that we’re on the same page: when you say “value-loading problem”, are you referring to something more specific than the general issue of getting an AI to learn and behave according to our values?
***
META: I can understand that you’re frustrated about this topic, especially if it seems to you that the “MIRI-sphere” (as you called it in a different comment) is persistently refusing to acknowledge something that appears obvious to you.
Obviously, I don’t agree with that characterization, but in general I don’t want to engage in a discussion that one side is finding increasingly unpleasant, especially since that often causes the discussion to rapidly deteriorate in quality after a few replies.
As such, I want to explicitly and openly relieve you of any social obligation you may have felt to reply to this comment. If you feel that your time would be better spent elsewhere, please do!
The thesis that values are fragile doesn’t have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.
If you can solve the prediction task, you can probably use the solution to create a reward function for your reinforcement learner.
I’m confused by what you’re saying.
The argument for the fragility of value never relied on AI being unable to understand human values. Are you claiming it does?
If not, what are you claiming?
From Superintelligence:
(emphasis mine)
My claim is that we are likely to see a future GPT-N system which (a) possesses the capability to understand / represent humanly meaningful value and (b) does not “resist attempts to meddle with its motivational system”. The issue is that the word “intelligence” brings in anthropomorphic connotations of having a goal etc., but AI programmers have lower-level building blocks at their disposal than the large anthropomorphic Duplos humans sometimes imagine.
Well, yes. This is primarily because GPT-like systems don’t have a “motivational system” with which to meddle. This is not a new argument by any means: the concept of AI systems that aren’t architecturally goal-oriented by default is known as “Tool AI”, and there’s plenty of pre-existing discussion on this topic. I’m not sure what you think GPT-3 adds to the discussion that hasn’t already been mentioned?
Sorry, just to make sure I’m not wasting my time here (feeling grumpy)… You said earlier that “The argument for the fragility of value never relied on AI being unable to understand human values.” I gave you a quote from Superintelligence which talked about AI being unable to understand human values. Are you gonna, like, concede the point or something? Because if you’re just throwing out arguments for the AI doom bottom line without worrying too much about whether they’re correct, I’d rather you throw them at someone else!
Anyway, I read Gwern’s article a while ago and I thought it was pretty bad. If I recall correctly, Gwern confuses various different notions, for example, he seemed to think that if you replace enough bits of handcrafted software with bits trained using machine learning, an agent will spontaneously emerge. The steelman seems to be something like “there will be competitive pressures to misuse a tool in agentlike ways”. I agree this is a risk, and I hope OpenAI keeps future versions of GPT to themselves.
It’s looking more plausible that very capable Tool AIs
Are possible
Are easier to build than Agent AIs
Will be able to solve the value-loading problem
(IIRC none of Gwern’s article addresses any of these 3 points?)
On “conceding the point”:
The thesis that values are fragile doesn’t have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.
On gwern’s article:
I’m not sure how to respond to this, except to state that neither this specific claim nor anything particularly close to it appears in the article I linked.
On Tool AI:
As far as I’m aware, this point has never been the subject of much dispute.
This is still arguable; I have my doubts, but in a “big picture” sense this is largely irrelevant to the greater point, which is:
This is (and remains) the crux. I still don’t see how GPT-3 supports this claim! Just as a check that we’re on the same page: when you say “value-loading problem”, are you referring to something more specific than the general issue of getting an AI to learn and behave according to our values?
***
META: I can understand that you’re frustrated about this topic, especially if it seems to you that the “MIRI-sphere” (as you called it in a different comment) is persistently refusing to acknowledge something that appears obvious to you.
Obviously, I don’t agree with that characterization, but in general I don’t want to engage in a discussion that one side is finding increasingly unpleasant, especially since that often causes the discussion to rapidly deteriorate in quality after a few replies.
As such, I want to explicitly and openly relieve you of any social obligation you may have felt to reply to this comment. If you feel that your time would be better spent elsewhere, please do!
If you can solve the prediction task, you can probably use the solution to create a reward function for your reinforcement learner.