I think (and I hope) that something like “maximize positive experiences of sentient entities” could actually be a convergent goal of any AI that are capable of reflecting on these questions. I don’t think that humans just gravitate towards this kind of utility maximization because they evolved some degree of pro-sociality. Instead, something like this seems like it’s the only thing inherently worth striving to, in the absence of any other set of values or goals.
The grabby aliens type scenario in the first parable seems like the biggest threat to the idea that an AI might discover on its own that it wants to maximize fun. Or at least it would decide that it shouldn’t maximize fun while ignoring self-defense.
Regarding the second parable, I don’t find it any more plausible that an AI would want to maximize complex puppet shows in a failed attempt to recreate human-like values than that it would want to maximize paperclips. Both scenarios assume that the AI is ultimately not able or willing to reflect on its own goals and to change them to something that’s more meaningful in some absolute sense.
I think there is at least a good chance that an AI will be able to self-reflect and to think about the same questions that we are pondering here, and then make good decisions based on that, even if it doesn’t particularly care about ice cream, or even about humans.
Ocracoke
Utopia for artificial minds
It’s not clear to me that it’s necessarily possible to get to a point where a model can achieve rapid self-improvement without expensive training or experimenting. Evolution hasn’t figured out a way to substantially reduce the time and resources required for any one human’s cognitive development.
I agree that even in the current paradigm there are many paths towards sudden capability gains, like the suboptimal infrastructure scenario you pointed to. I just don’t know if I would consider that FOOM, which in my understanding implies rapid recursive self-improvement.
Maybe this is just a technicality. I expect things to advance pretty rapidly from now on with no end in sight. But before we had these huge models, FOOM with very fast recursive self-improvement seemed almost inevitable to me. Now I think that it’s possible that model size and training compute put at least some cap on the rate of self-improvement (maybe weeks instead of minutes).
Foom seems unlikely in the current LLM training paradigm
We might be able to use BCIs to enhance our intelligence, but it’s not entirely clear to me how that would work. What parts of the brain does it connect to?
What’s easier for me to imagine is how BCIs would allow an AGI to take control of human bodies (and bodies of other animals). Robotics isn’t nearly as close to outperforming human bodies as AI is to outperforming human minds, so controlling human bodies by replacing the brain with a BCI that connects to all the incoming and outgoing nerves might be a great way for an AGI to navigate the physical world.
I think it’s not clear at all that the average animal in the wild has a life of net negative utility, nor do I think it’s clear that the average present-day human has a life of net positive utility.
If you compare the two, wild animals probably have more gruesome deaths and starve more, but most of the time they might be happier than the average human since they live in an environment they evolved to live in.
especially for the vast majority of animals who give birth to thousands of young of which on average only 2 will ever reach adulthood
Most animals to which this applies probably don’t have the cognitive capacity to be upset by this. It just means that in those species, the vast majority of lives are short and end by being eaten by some other animal. From a human perspective this sounds terrible, but I don’t think it’s obvious at all that the net utility of these lives is negative (and I just mean the first person experience, not eco-system effects or anything like that).
Preparing for the apocalypse might help prevent it
I recently articulated similar ideas about motherly love. I don’t think it’s an example of successful alignment because evolution’s goals are aligned with the mother’s goals. In the example you give where a child loses their gonads at age 2, it would be an alignment failure if the mother continues devoting resources to the child. In reality that wouldn’t happen, because with motherly love, evolution created an imperfect intermediate goal that is generally but not always the same as the goal of spreading your genes.
I totally agree that motherly love is not a triumph of evolution aligning humans with its goals. But I think it’s a good example of robust alignment between the mother’s actions and the child’s interests that generalizes well to OOD environments.