You can make AI care about us with this one weird trick:
1. Train a separate agent action reasoning network. For LLM tech this should be training on completing interaction sentences, think “Alice pushed Bob. ___ fell due to ___”, with a tokenizer that generalizes agents(Alice and Bob) into generic {agent 1, agent n} and “self agent”. Then we replace various Alices and Bobs in various action sentences with generic agent tokens, and train on guessing consequences or prerequisites of various actions from real situations that you can get from any text corpus.
2. Prune anything that has to do with agent reasoning from the parent LLM. Any reasoning has to go through the agent reasoning network.
3. In anything that has to do with Q-learning cost we replace tokens in agent reasoning net with “self token”, rerun the network, and take highest cost of two. Repeat this for every agent token. Agent will be forced to treat harm to anyone else as harm to itself.
Emulated empathy. I’m concerned that this might produce a basilisk though, also it will be inherently weak against predatory ai as it will be forced to account for its wellbeing too. Might work though if these agents are deployed autonomously and allowed to exponentially grow. I dunno. I think we’ll all die this year, knowing humanity.
You can make AI care about us with this one weird trick:
1. Train a separate agent action reasoning network. For LLM tech this should be training on completing interaction sentences, think “Alice pushed Bob. ___ fell due to ___”, with a tokenizer that generalizes agents(Alice and Bob) into generic {agent 1, agent n} and “self agent”. Then we replace various Alices and Bobs in various action sentences with generic agent tokens, and train on guessing consequences or prerequisites of various actions from real situations that you can get from any text corpus.
2. Prune anything that has to do with agent reasoning from the parent LLM. Any reasoning has to go through the agent reasoning network.
3. In anything that has to do with Q-learning cost we replace tokens in agent reasoning net with “self token”, rerun the network, and take highest cost of two. Repeat this for every agent token. Agent will be forced to treat harm to anyone else as harm to itself.
Emulated empathy. I’m concerned that this might produce a basilisk though, also it will be inherently weak against predatory ai as it will be forced to account for its wellbeing too. Might work though if these agents are deployed autonomously and allowed to exponentially grow. I dunno. I think we’ll all die this year, knowing humanity.