It sounds like there are three separate debates going on:
1. What is intelligence? What is rationality? What is a goal?
‘Intelligence’ is generally defined here as “an agent’s ability to achieve goals in a wide range of environments” (without wasting resources). See What is intelligence?. That may not be sufficient for what we intuitively mean by ‘human intelligence’, but it’s necessary, and it’s the kind of intelligence relevant to worries about ‘intelligence explosion’, which are one of the two central organizing concerns of LessWrong. (The other being everyday human irrationality.)
Instrumental rationality is acting in such a way that you attain your values. So intelligence, as understood here, is simply the disposition to exhibit efficient domain-general instrumental rationality.
‘Goals’ or ‘(outcome-style) values’, as I understand them, are encodings of intrinsically low-probability events that cause optimization processes to make those events more probable. See Optimization. Individual humans are not simple optimization processes (at a minimum, we don’t consistently optimize for the things we believe we do) and do not have easily specified values in this strict sense, though they are made up of many (competing) optimization processes and value-bearing subsystems. (See The Blue-Minimizing Robot.)
When we speak of Artificial Intelligence we usually assume, if only as a simplification, that the system has particular values/goals. If it’s inefficiently designed then perhaps these values emerge out of internal conflicts between sub-agents, but regardless the system as a whole has some internally specified predictable effect on the world, if it is allowed to act at all. You can view human individuals analogously, but such a description (to be predictive) will need to be far more complicated than an analogous description that takes account of actual human psychological mechanisms.
2. Does intelligence require that you not merely ‘follow your programming’?
Behaviors have to come from somewhere. Some state of the world now results in some later state of the world; if we start with an instantiated algorithm and an environment in a closed system and then later see the system change, either the algorithm, the environment, or some combination must have produced the later change. Even ‘random’ changes must be encoded into either the algorithm or the environment (e.g., the environment must possess sort of random-number-generating law determining its actions). So when we say the AI follows its program, we don’t mean anything more exotic than that its programming is causally responsible for its actions; it is not purely a result of the surrounding environment.
3. Can an AI change its goals?
Yes. Of course it can. (If its programming and/or environment has that causal tendency.) In fact, creating AIs that self-modify at all without completely (or incompletely-but-unpredictably) changing their goals is an enormously difficult problem, and one of the central concerns for MIRI right now. The point of thought experiments like ‘Clippy the paperclip maximizer’ is that solving this problem, of building stable values, will not in itself suffice for Friendly AI; most agents may not have stable values, but even most of the agents that do have stable values do not have values that are at all conducive to human welfare. See Five Theses.
I don’t understand the premise of 3, even when looking at the five theses. I saw it restated under the second lemma, but it doesn’t seem an enormously difficult problem unless the most feasible approach to self-modifying AI was genetic algorithms, or other methods that don’t have an area explicitly for values/goals. Is there anything I’m missing?
It sounds like there are three separate debates going on:
1. What is intelligence? What is rationality? What is a goal?
‘Intelligence’ is generally defined here as “an agent’s ability to achieve goals in a wide range of environments” (without wasting resources). See What is intelligence?. That may not be sufficient for what we intuitively mean by ‘human intelligence’, but it’s necessary, and it’s the kind of intelligence relevant to worries about ‘intelligence explosion’, which are one of the two central organizing concerns of LessWrong. (The other being everyday human irrationality.)
Instrumental rationality is acting in such a way that you attain your values. So intelligence, as understood here, is simply the disposition to exhibit efficient domain-general instrumental rationality.
‘Goals’ or ‘(outcome-style) values’, as I understand them, are encodings of intrinsically low-probability events that cause optimization processes to make those events more probable. See Optimization. Individual humans are not simple optimization processes (at a minimum, we don’t consistently optimize for the things we believe we do) and do not have easily specified values in this strict sense, though they are made up of many (competing) optimization processes and value-bearing subsystems. (See The Blue-Minimizing Robot.)
When we speak of Artificial Intelligence we usually assume, if only as a simplification, that the system has particular values/goals. If it’s inefficiently designed then perhaps these values emerge out of internal conflicts between sub-agents, but regardless the system as a whole has some internally specified predictable effect on the world, if it is allowed to act at all. You can view human individuals analogously, but such a description (to be predictive) will need to be far more complicated than an analogous description that takes account of actual human psychological mechanisms.
2. Does intelligence require that you not merely ‘follow your programming’?
Behaviors have to come from somewhere. Some state of the world now results in some later state of the world; if we start with an instantiated algorithm and an environment in a closed system and then later see the system change, either the algorithm, the environment, or some combination must have produced the later change. Even ‘random’ changes must be encoded into either the algorithm or the environment (e.g., the environment must possess sort of random-number-generating law determining its actions). So when we say the AI follows its program, we don’t mean anything more exotic than that its programming is causally responsible for its actions; it is not purely a result of the surrounding environment.
3. Can an AI change its goals?
Yes. Of course it can. (If its programming and/or environment has that causal tendency.) In fact, creating AIs that self-modify at all without completely (or incompletely-but-unpredictably) changing their goals is an enormously difficult problem, and one of the central concerns for MIRI right now. The point of thought experiments like ‘Clippy the paperclip maximizer’ is that solving this problem, of building stable values, will not in itself suffice for Friendly AI; most agents may not have stable values, but even most of the agents that do have stable values do not have values that are at all conducive to human welfare. See Five Theses.
I don’t understand the premise of 3, even when looking at the five theses. I saw it restated under the second lemma, but it doesn’t seem an enormously difficult problem unless the most feasible approach to self-modifying AI was genetic algorithms, or other methods that don’t have an area explicitly for values/goals. Is there anything I’m missing?