TLDR:If you want to do some RL/evolutionary open ended thing that finds novel strategies. It will get goodharted horribly and the novel strategies that succeed without gaming the goal may include things no human would want their caregiver AI to do.
Orthogonally to your “capability”, you need to have a “goal” for it.
Game playing RL architechtures like AlphaStart and OpenAI-Five have dead simple reward functions (win the game) and all the complexity is in the reinforcement learning tricks to allow efficient learning and credit assignment at higher layers.
So child rearing motivation is plausibly rooted in cuteness preference along with re-use of empathy. Empathy plausibly has a sliding scale of caring per person which increases for friendships (reciprocal cooperation relationships) and relatives including children obviously. Similar decreases for enemy combatants in wars up to the point they no longer qualify for empathy.
I want agents that take effective actions to care about their “babies”, which might not even look like caring at the first glance.
ASI will just flat out break your testing environment. Novel strategies discovered by dumb agents doing lots of exploration will be enough. Alternatively the test is “survive in competitive deathmatch mode” in which case you’re aiming for brutally efficient self replicators.
The hope with a non-RL strategy or one of the many sort of RL strategies used for fine tuning is that you can find the generalised core of what you want within the already trained model and the surrounding intelligence means the core generalises well. Q&A fine tuning a LLM in english generalises to other languages.
Also, some systems are architechted in such a way that the caring is part of a value estimator and the search process can be made better up till it starts goodharting the value estimator and/or world model.
Yes they can, until they will actually make a baby, and after that, it’s usually really hard to sell loving mother “deals” that will involve suffering of her child as the price, or abandon the child for the more “cute” toy, or persuade it to hotwire herself to not care about her child (if she is smart enough to realize the consequences).
Yes, once the caregiver has imprinted that’s sticky. Note that care drive surrogates like pets can be just as sticky to their human caregivers. Pet organ transplants are a thing and people will spend nearly arbitrary amounts of money caring for their animals.
But our current pets aren’t super-stimuli. Pets will poop on the floor, scratch up furniture and don’t fulfill certain other human wants. You can’t teach a dog to fish the way you can a child.
When this changes, real kids will be disappointing. Parents can have favorite children and those favorite children won’t be the human ones.
Superstimuli aren’t about changing your reward function but rather discovering a better way to fulfill your existing reward function. For all that ice cream is cheating from a nutrition standpoint it still tastes good and people eat it, no brain surgery required.
Also consider that humans optimise their pets (neutering/spaying) and children in ways that the pets and children do not want. I expect some of the novel strategies your AI discovers will be things we do not want.
TLDR:If you want to do some RL/evolutionary open ended thing that finds novel strategies. It will get goodharted horribly and the novel strategies that succeed without gaming the goal may include things no human would want their caregiver AI to do.
Game playing RL architechtures like AlphaStart and OpenAI-Five have dead simple reward functions (win the game) and all the complexity is in the reinforcement learning tricks to allow efficient learning and credit assignment at higher layers.
So child rearing motivation is plausibly rooted in cuteness preference along with re-use of empathy. Empathy plausibly has a sliding scale of caring per person which increases for friendships (reciprocal cooperation relationships) and relatives including children obviously. Similar decreases for enemy combatants in wars up to the point they no longer qualify for empathy.
ASI will just flat out break your testing environment. Novel strategies discovered by dumb agents doing lots of exploration will be enough. Alternatively the test is “survive in competitive deathmatch mode” in which case you’re aiming for brutally efficient self replicators.
The hope with a non-RL strategy or one of the many sort of RL strategies used for fine tuning is that you can find the generalised core of what you want within the already trained model and the surrounding intelligence means the core generalises well. Q&A fine tuning a LLM in english generalises to other languages.
Also, some systems are architechted in such a way that the caring is part of a value estimator and the search process can be made better up till it starts goodharting the value estimator and/or world model.
Yes, once the caregiver has imprinted that’s sticky. Note that care drive surrogates like pets can be just as sticky to their human caregivers. Pet organ transplants are a thing and people will spend nearly arbitrary amounts of money caring for their animals.
But our current pets aren’t super-stimuli. Pets will poop on the floor, scratch up furniture and don’t fulfill certain other human wants. You can’t teach a dog to fish the way you can a child.
When this changes, real kids will be disappointing. Parents can have favorite children and those favorite children won’t be the human ones.
Superstimuli aren’t about changing your reward function but rather discovering a better way to fulfill your existing reward function. For all that ice cream is cheating from a nutrition standpoint it still tastes good and people eat it, no brain surgery required.
Also consider that humans optimise their pets (neutering/spaying) and children in ways that the pets and children do not want. I expect some of the novel strategies your AI discovers will be things we do not want.