Pinging @stevenbyrnes : do you agree with me that instead of mapping those protoAGIs to a queue of instructions it would be best to have the AGI be made from a bunch of brain strcture with according prompts? For example “amygdala” would be in charge of returning an int between 0 and 100 indicating feat level. A “hypoccampus” would be in charge of storing and retrieving memories etc. I guess the thalamus would be consciousness and the cortex would process some abstract queries.
We could also use active inference and bayesian updating to model current theories of consciousness. Even use it to model schizophrenia by changing the number of past messages some strctures can access (i.e. modeling long range connection issues) etc.
To me that sounds way easier to inspect and align than pure black boxes as you can throttle the speed and manually change values like make sure the AGI does not feel threatened etc.
Is anyone aware of similar work? I’ve created a diagram of the brain structures and its roles in a few minutes with chatgpt and it seems super easy.
I don’t know what Steve would say, but I know that some folks from DeepMind and Stanford have recently used an LLM to create rewards to train another LLM to do specific tasks, like negotiation. which I think is exactly what you’ve described. It seems to work really well.
Pinging @stevenbyrnes : do you agree with me that instead of mapping those protoAGIs to a queue of instructions it would be best to have the AGI be made from a bunch of brain strcture with according prompts? For example “amygdala” would be in charge of returning an int between 0 and 100 indicating feat level. A “hypoccampus” would be in charge of storing and retrieving memories etc. I guess the thalamus would be consciousness and the cortex would process some abstract queries.
We could also use active inference and bayesian updating to model current theories of consciousness. Even use it to model schizophrenia by changing the number of past messages some strctures can access (i.e. modeling long range connection issues) etc.
To me that sounds way easier to inspect and align than pure black boxes as you can throttle the speed and manually change values like make sure the AGI does not feel threatened etc.
Is anyone aware of similar work? I’ve created a diagram of the brain structures and its roles in a few minutes with chatgpt and it seems super easy.
I don’t know what Steve would say, but I know that some folks from DeepMind and Stanford have recently used an LLM to create rewards to train another LLM to do specific tasks, like negotiation. which I think is exactly what you’ve described. It seems to work really well.
Reward Design with Language Models