The meme-theoretic view of humans says: Memes are to humans as sailors are to ships in the age of sail.
If you want to predict where a ship will go, ask: Is it currently crewed by the French or the English? Is it crewed by merchants, pirates, or soldiers? These are the most important questions.
You can also ask e.g. “Does it have a large cargo hold? Is it swift? Does it have many cannon-ports?” But these questions are less predictive of where it will go next. They are useful for explaining how it got the crew it has, but only to a point—while it’s true that a ship built with a large cargo hold is more likely to be a merchant for more of its life, it’s quite common to encounter a ship with a large cargo hold that is crewed by soldiers, or for a ship built in France to be sailed by the English, etc. The main determinants of how a ship got the crew it currently has are its previous interactions with other crews, e.g. the fights it had, the money that changed hands when it was in port, etc.
The meme-theoretic view says: Similarly, the best way to explain human behavior is by reference to the memes in their head, and the best way to explain how those memes got there is to talk about the history of how those memes evolved inside the head in response to other memes they encountered outside the head. Non-memetic properties of the human (their genes, their nutrition, their age, etc.) matter, but not as much, just like how the internal layout of a ship, its size, its age, etc. matter too, but not as much as the sailors inside it.
Anyhow, the meme-theoretic view is an interesting contrast to the highly-capable-agent view. If we apply the meme-theoretic view to AI, we get the following vague implications:
--Mesa-alignment problems are severe. The paper already talks about how there are different ways a system could be psuedo-aligned, e.g. it could have a stable objective that is a proxy of the real objective, or it could have a completely different objective but be instrumentally motivated to pretend, or it could have a completely different objective but have some irrational tic or false belief that makes it behave the way we want for now. Well, on a meme-theoretic view these sorts of issues are the default, they are the most important things for us to be thinking about.
--There may be no stable objective/goal at all in the system. It may have an objective/goal now, but if the objective is a function of the memes it currently has and the memes can change in hard-to-predict ways based on which other memes it encounters...
--Training/evolving an AI to behave a certain way will be very different at each stage of smartness. When it is too dumb to host anything worthy of the name meme, it’ll be one thing. When it is smart enough to host simple memes, it’ll be another thing. When it is smart enough to host complex memes, it’ll be another thing entirely. Progress and success made at one level might not carry over to higher levels.
--There is a massive training vs. deployment problem. The memes our AI encounters in deployment will probably be massively different from those in training, so how do we ensure that it reacts to them appropriately? We have no idea what memes it will encounter when deployed, because we want it go to out into the world and do all sorts of learning and doing on our behalf.
Thanks to Abram Demski for reading a draft and providing some better terminology
If there are multiple AI’s exchanging memes with each other and with humans there will likely be AI-AI stag hunts and AI-human stag hunts emerging in largely unpredictable ways due to the rapid pace of memetic evolution.
Benefits: less chance of a rogue paperclip maximizer forming
Drawbacks: greater chance of humans falling for AI generated memes
Also the rate of memetic evolution may be even faster for AIs than humans due to their differing architecure.
I don’t understand, can you elaborate / unpack that?
A stag hunt: https://www.lesswrong.com/tag/stag-hunt is a game theory term about a pattern of coordination that commonly emerges in multi party interactions.
AIs have coordination problems with other AIs and with humans. AGIs exponentially more so as well discussed on LW.
In attempting to compete and solve such coordination problems, the usage of memes will almost certainly be utilized, in both AI-AI and AI-human interaction. The dynamics will induce memetic evolution.
[In my opinion]
Memes are self-replicating concepts (given you have enough humans to spread them). Highly capable minds are different as they contain predictive models of: world, self, and others. This allows them to manipulate both objects in the world, and other people to fulfill their needs. Since memes don’t have these capacities, and even though they are related to human behavior, they should not be accounted as the cause of human behavior. Even if the best way to explain human behavior is through memes, they don’t necessarily account of most of the decision-making process.
[/In my opinion]