Esben Kran comments on [Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”

Esben Kran 22 Apr 2022 22:26 UTC
5 points
[...]Still, both the reward function and the RL algorithm are inputs into the adult’s jealousy-related behavior[...]
I probably just don’t know enough about jealousy networks to comment here but I’d be curious to see the research here (maybe even in an earlier post).
Does anyone believe in “the strict formulation”?
Hopefully not, but as I mention, often a too-strict formulation imh.
[...]first AGI can hang out with younger AGIs[...]
More the reverse. And again, this is probably taking it farther than I would take this idea but it would be pre-AGI training in an environment with symbolic “aligned” models, learning the ropes from this, being used as the “aligned” model in the next generation and so on. IDA with a heavy RL twist and scalable human oversight in the sense that humans would monitor rewards and environment states instead of providing feedback on every single action. Very flawed but possible.
RE: RE:
- Yeah, this is a lot of what the above proposal was also about.
[...] and the toddler gets negative reward for inhibiting the NPCs from accomplishing their goals, and positive reward for helping the NPCs accomplish their goals [...]
As far as I understand from the post, the reward comes only from understanding the reward function before interaction and not after which is the controlling factor for obstructionist behaviour.
- Agreed, and again more as an ingredient in the solution than an ends in itself. BNN OOD management is quite interesting so looking forward to that post!
- Steven Byrnes 26 Apr 2022 14:25 UTC
  3 points
  Parent
  I probably just don’t know enough about jealousy networks to comment here but I’d be curious to see the research here (maybe even in an earlier post).
  I don’t think “the research here” exists. I’ll speculate a bit in the next post.
  Does anyone believe in “the strict formulation”?
  Hopefully not, but as I mention, often a too-strict formulation imh.
  Can you point to any particular person who believes in “a too-strict formulation” of cortical uniformity? Famous or not. What did they say? Just curious.
  (Or maybe you’re talking about me?)
  symbolic “aligned” models
  Any thoughts on how to make those?
  - Not Relevant 26 Apr 2022 15:43 UTC
    1 point
    Parent
    I think he’s thinking of like, NPCs via behavior-cloning co-op MMO players or something. Like it won’t teach all of human values, but plausibly it would teach “the golden rule” and other positive sum things.
    
    (I don’t think that literal strategy works, but behavior-cloning elementary school team sports might get at a surprising fraction of “normal child cooperative behaviors”?)