Thank you for this overview. A couple of thoughts:
There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn’t process tasks in highly specialized modules, which is perhaps some evidence for a ULM in the human brain.
The importance of redundancy in biological systems might be another piece evidence for ULMs.
You write that “Infant emotions appear to simplify down to a single axis of happy/sad”, which I think is not true. Surprise, fear and embarrassment are for example very early emotions as well (can’t find a citation for this, sorry).
Minor nitpick: I think it is clumsy to say: “a ULM is more powerful than a TM because a ULM can automatically programs itself”, since a TM can likely emulate a ULM, it might just be a bad model for it (bad in the sense of representation efficiency).
Designing a body for a superintelligence will possibly still be a difficult task. What makes humans friendly is, I think, largely a result of (1) a dependency on others due a need for maintaining a body and needs to interact with other people (conversation, physical contact), and (2) empathy. That is, being emotionally disconnected from other humans is one way to turn against them. If you don’t have these emotional responses deeply built into a body for a ULM, the AI will probably turn out to be indifferent towards humans leading to a large set of other problems.
You write that Yudkowsky’s box problem is a strawman and a distraction. How do you arrive at this conclusion exactly?
There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn’t process tasks in highly specialized modules
The actual paper is “Cortical information flow during flexible sensorimotor decisions”; it can be found here. I don’t believe the reporter’s summary is very accurate. They traced the flow of information in a moving dot task in a couple dozen cortical regions. It’s interesting, but I don’t think it especially differentiates the ULH.
.3. Good point. I’ll need to correct that. I’m skeptical of embarrassment, but surprise and fear certainly.
.4. Yes that’s correct. It perhaps would be more accurate to say more useful or more valuable. I meant more powerful in a general political/economic utility sense.
.5. I agree that the human brain, in particular the reward system, has dependencies on the body that are probably complex. However, reverse engineering empathy probably does not require exactly copying biological mechanisms.
.6. I should probably just cut that sentence, because it is a distraction itself.
But for context on the previous old boxing discussions. .. See this post in particular. Here Yudkowsky presents a virtual sandbox in the context of a sci fi story. To break out, he has to give the AI essentially infinite computation, and even then the humans also have to be incredibly dumb—they intentionally send an easter egg message. The humans apparently aren’t even monitoring their creation. etc. It’s a strawman. Later it is used as evidence to suggest that EY has somehow proven that computer security sandboxes can’t work.
However, reverse engineering empathy probably does not require exactly copying biological mechanisms.
This is just a pet theory (and being new to cognitive science this might well be wrong): Physical pain is some sort of hardwired thought disturbance, and the brain appears to have some sort of clarity attractor (which also explains intrinsic motivation and the reward we receive from Eureka moments and fun, cf. Schmidhuber). The brain appears to borrow the mechanism of physical pain for action selection on a high level if something severely limits anticipated prospects (that’s why rejection, getting something wrong and losing something hurts). Empathy is the ability to have pain caused by mirror neurons, which is just an activation pattern generated in an auto-associative NN due to the overlap of activation patterns of firsthand and non-firsthand experiences. That means, the body of an AI needs to be sufficiently similar to a human body for this auto-association to work. One way to achieve that would perhaps be to actually replace the brain of a deceased volunteer with an artificial one. The fact that we have empathy for animals might be a hint that it doesn’t need to be that similar, but on the other hand we are much more comfortable with killing a bug than with killing a mammal.
You write that Yudkowsky’s box problem is a strawman and a distraction. How do you arrive at this conclusion exactly?
Since I don’t think we can make a very realistic sandbox (at least not in the near future), perhaps the idea is to have an AI design that is known to work similarly with and without interaction with the world (looking at training data sampled from an environment versus the environment itself). Then, putatively, we could test the AI in the non-interactive case before getting anywhere near an AI-box scenario.
Thank you for this overview. A couple of thoughts:
There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn’t process tasks in highly specialized modules, which is perhaps some evidence for a ULM in the human brain.
The importance of redundancy in biological systems might be another piece evidence for ULMs.
You write that “Infant emotions appear to simplify down to a single axis of happy/sad”, which I think is not true. Surprise, fear and embarrassment are for example very early emotions as well (can’t find a citation for this, sorry).
Minor nitpick: I think it is clumsy to say: “a ULM is more powerful than a TM because a ULM can automatically programs itself”, since a TM can likely emulate a ULM, it might just be a bad model for it (bad in the sense of representation efficiency).
Designing a body for a superintelligence will possibly still be a difficult task. What makes humans friendly is, I think, largely a result of (1) a dependency on others due a need for maintaining a body and needs to interact with other people (conversation, physical contact), and (2) empathy. That is, being emotionally disconnected from other humans is one way to turn against them. If you don’t have these emotional responses deeply built into a body for a ULM, the AI will probably turn out to be indifferent towards humans leading to a large set of other problems.
You write that Yudkowsky’s box problem is a strawman and a distraction. How do you arrive at this conclusion exactly?
The actual paper is “Cortical information flow during flexible sensorimotor decisions”; it can be found here. I don’t believe the reporter’s summary is very accurate. They traced the flow of information in a moving dot task in a couple dozen cortical regions. It’s interesting, but I don’t think it especially differentiates the ULH.
.3. Good point. I’ll need to correct that. I’m skeptical of embarrassment, but surprise and fear certainly.
.4. Yes that’s correct. It perhaps would be more accurate to say more useful or more valuable. I meant more powerful in a general political/economic utility sense.
.5. I agree that the human brain, in particular the reward system, has dependencies on the body that are probably complex. However, reverse engineering empathy probably does not require exactly copying biological mechanisms.
.6. I should probably just cut that sentence, because it is a distraction itself.
But for context on the previous old boxing discussions. .. See this post in particular. Here Yudkowsky presents a virtual sandbox in the context of a sci fi story. To break out, he has to give the AI essentially infinite computation, and even then the humans also have to be incredibly dumb—they intentionally send an easter egg message. The humans apparently aren’t even monitoring their creation. etc. It’s a strawman. Later it is used as evidence to suggest that EY has somehow proven that computer security sandboxes can’t work.
This is just a pet theory (and being new to cognitive science this might well be wrong): Physical pain is some sort of hardwired thought disturbance, and the brain appears to have some sort of clarity attractor (which also explains intrinsic motivation and the reward we receive from Eureka moments and fun, cf. Schmidhuber). The brain appears to borrow the mechanism of physical pain for action selection on a high level if something severely limits anticipated prospects (that’s why rejection, getting something wrong and losing something hurts). Empathy is the ability to have pain caused by mirror neurons, which is just an activation pattern generated in an auto-associative NN due to the overlap of activation patterns of firsthand and non-firsthand experiences. That means, the body of an AI needs to be sufficiently similar to a human body for this auto-association to work. One way to achieve that would perhaps be to actually replace the brain of a deceased volunteer with an artificial one. The fact that we have empathy for animals might be a hint that it doesn’t need to be that similar, but on the other hand we are much more comfortable with killing a bug than with killing a mammal.
Since I don’t think we can make a very realistic sandbox (at least not in the near future), perhaps the idea is to have an AI design that is known to work similarly with and without interaction with the world (looking at training data sampled from an environment versus the environment itself). Then, putatively, we could test the AI in the non-interactive case before getting anywhere near an AI-box scenario.