A part of the idea of “meme” is that the human mind is not designed as a unified algorithm, but consists of multiple parts, that can be individually gained or replaced. (The rest of the idea is that the parts are mostly acquired by learning from other humans, so their copies circulate in the population which provides an evolutionary environment for them.)
Could this first part make sense alone? Could an AI be constructed—in analogy to “Kegan level 5” in humans—in the way that it creates these parts (randomly? by mutation of existing ones?), then evaluates them somehow, keeps the good ones and discards the bad ones, with the idea that it may be easier to build a few separate models, and learn which one to use in which circumstances, than going directly for one unified model of everything? In other words, that the general AI would internally be an arena of several smaller, non-general AIs; with a mechanism to create, modify, and select new ones? Like, we want to teach the AI how to write poetry, so the AI will create a few sub-AIs that can do only poetry and nothing more, evaluate them, and then follow the most successful one of them. Another set of specialized sub-AIs for communicating with humans; another for physics; etc. With some meta mechanism which would decide when a new set of sub-AIs are needed (e.g. when all existing sub-AIs are doing poorly at solving the problem).
And, like, this architecture could work for some time; with greater capacity the general AI would be able to spawn more sub-AIs and cover more topics. And then at some moment, the process would generate a new sub-AI that somehow hijacks the meta mechanism and convices it that it is a good model for everything. For example, it could stumble upon an idea “hey, I should simply wirehead myself” or “hey, I should try being anti-inductive for a while and actually discard the useful sub-AIs and keep the harmful ones” (and then it would find out that this was a very bad idea, but because now it keeps the bad ideas, it would keep doing it).
Even if we had an architecture that does not allow full self-modification, so that wireheading or changing the meta mechanism is not possible, maybe the machine that cannot fully self-modify would find out that it is very efficient to simulate a smaller AI, such that the simulated AI can self-modify. And the simulated AI would work reasonably for a long time, and then suddenly start doing very stupid things… and before the simulating AI realizes that something went wrong, maybe some irrepairable damage already happened.
...this all is too abstract for me, so I even have no idea whether what I wrote here actually makes any sense. I hope a smarter minds may look at this and extract the parts that make sense, assuming there are any.
A part of the idea of “meme” is that the human mind is not designed as a unified algorithm, but consists of multiple parts, that can be individually gained or replaced. (The rest of the idea is that the parts are mostly acquired by learning from other humans, so their copies circulate in the population which provides an evolutionary environment for them.)
Could this first part make sense alone? Could an AI be constructed—in analogy to “Kegan level 5” in humans—in the way that it creates these parts (randomly? by mutation of existing ones?), then evaluates them somehow, keeps the good ones and discards the bad ones, with the idea that it may be easier to build a few separate models, and learn which one to use in which circumstances, than going directly for one unified model of everything? In other words, that the general AI would internally be an arena of several smaller, non-general AIs; with a mechanism to create, modify, and select new ones? Like, we want to teach the AI how to write poetry, so the AI will create a few sub-AIs that can do only poetry and nothing more, evaluate them, and then follow the most successful one of them. Another set of specialized sub-AIs for communicating with humans; another for physics; etc. With some meta mechanism which would decide when a new set of sub-AIs are needed (e.g. when all existing sub-AIs are doing poorly at solving the problem).
And, like, this architecture could work for some time; with greater capacity the general AI would be able to spawn more sub-AIs and cover more topics. And then at some moment, the process would generate a new sub-AI that somehow hijacks the meta mechanism and convices it that it is a good model for everything. For example, it could stumble upon an idea “hey, I should simply wirehead myself” or “hey, I should try being anti-inductive for a while and actually discard the useful sub-AIs and keep the harmful ones” (and then it would find out that this was a very bad idea, but because now it keeps the bad ideas, it would keep doing it).
Even if we had an architecture that does not allow full self-modification, so that wireheading or changing the meta mechanism is not possible, maybe the machine that cannot fully self-modify would find out that it is very efficient to simulate a smaller AI, such that the simulated AI can self-modify. And the simulated AI would work reasonably for a long time, and then suddenly start doing very stupid things… and before the simulating AI realizes that something went wrong, maybe some irrepairable damage already happened.
...this all is too abstract for me, so I even have no idea whether what I wrote here actually makes any sense. I hope a smarter minds may look at this and extract the parts that make sense, assuming there are any.