Clearly, there are some internal values that an AI would need to be able to modify, or else it couldn’t learn. But I think there is good reason to disallow an AI from modifying its own rules for reward, at least to start out. An analogy in humans is that we can do some amazingly wonderful things, but some people go awry when they begin abusing drugs, thereby modifying their own reward circuitry. Severe addicts find they can’t manage a productive life, instead turning to crime to get just enough cash to feed their habits. I’d say that there is inherent danger for human intelligences in short-circuiting or otherwise modifying our reward pathways directly (i.e. chemically), and so there would likely be danger in allowing and AI to directly modify its reward pathways
Clearly, there are some internal values that an AI would need to be able to modify, or else it couldn’t learn. But I think there is good reason to disallow an AI from modifying its own rules for reward, at least to start out. An analogy in humans is that we can do some amazingly wonderful things, but some people go awry when they begin abusing drugs, thereby modifying their own reward circuitry. Severe addicts find they can’t manage a productive life, instead turning to crime to get just enough cash to feed their habits. I’d say that there is inherent danger for human intelligences in short-circuiting or otherwise modifying our reward pathways directly (i.e. chemically), and so there would likely be danger in allowing and AI to directly modify its reward pathways