yeahhhhhh missing TAP type reasoning is a really critical failure here, I think a lot of important stuff happens around signaling whether you’ll be an agent that is level 1 valuable to be around, and I’ve thought before about how keeping your hidden TAP depth short in ways that are recognizeable to others makes you more comfortable to be around because you’re more predictable. or something
this would have to take the form of something like, first make the agent as a slightly-stateful pattern-response bot, maybe with a global “emotion” state thing that sets which pattern-response networks to use. then try to predict the world in parts, unsupervised. then have preferences, which can be about other agents’ inferred mental states. then pull those preferences back through time, reinforcement learned. then add the retribution and deservingness things on top. power would be inferred from representations of other agents, something like trying to predict the other agents’ unobserved attributes.
also this doesn’t put level 4 as this super high level thing, it’s just a natural result of running the world prediction for a while.
the better version of this model probably takes the form of a list of the most important built-in input-action mappings.
yeahhhhhh missing TAP type reasoning is a really critical failure here, I think a lot of important stuff happens around signaling whether you’ll be an agent that is level 1 valuable to be around, and I’ve thought before about how keeping your hidden TAP depth short in ways that are recognizeable to others makes you more comfortable to be around because you’re more predictable. or something
this would have to take the form of something like, first make the agent as a slightly-stateful pattern-response bot, maybe with a global “emotion” state thing that sets which pattern-response networks to use. then try to predict the world in parts, unsupervised. then have preferences, which can be about other agents’ inferred mental states. then pull those preferences back through time, reinforcement learned. then add the retribution and deservingness things on top. power would be inferred from representations of other agents, something like trying to predict the other agents’ unobserved attributes.
also this doesn’t put level 4 as this super high level thing, it’s just a natural result of running the world prediction for a while.
the better version of this model probably takes the form of a list of the most important built-in input-action mappings.