Yikes, I’m not even comfortable maximizing my own CEV.
What do you think of this post by Tammy?
Where is the longer version of this? I do want to read it. :)
Well perhaps I should write it :)
Specifically, what is it about the human ancestral environment that made us irrational, and why wouldn’t RL environments for AI cause the same or perhaps a different set of irrationalities?
Mostly that thing where we had a lying vs lie-detecting arms race and the liars mostly won by believing their own lies and that’s how we have things like overconfidence bias and self-serving bias and a whole bunch of other biases. I think Yudkowsky and/or Hanson has written about this.
Unless we do a very stupid thing like reading the AI’s thoughts and RL-punish wrongthink, this seems very unlikely to happen.
If we give the AI no reason to self-deceive, the natural instrumentally convergent incentive is to not self-deceive, so it won’t self-deceive.
Again, though, I’m not super confident in this. Deep deception or similar could really screw us over.
Also, how does RL fit into QACI? Can you point me to where this is discussed?
I have no idea how Tammy plans to “train” the inner-aligned singleton on which QACI is implemented, but I think it will be closer to RL than SL in the ways that matter here.
Is this a massive exfohazard? Should this have been published?