jacob_cannell comments on Framing approaches to alignment and the hard problem of AI cognition

jacob_cannell 16 Dec 2021 3:36 UTC
8 points
This argument about whether human-level alignment is sufficient is at least a decade old. I suspect one issue is that human inter-human alignment is high variance. The phrase “human-level alignment” could conjure up anything from Ghandi to Hitler, from Bob Ross to Jeffrey Dahmer. If you model that as an adversarial draw, it’s pretty bad. As a random draw, it may be better than default unaligned, but still high risk. I tend to view it more optimistically as an optimistic draw, based on reverse engineering human altruism to control/amplify it.

I thought LW/MIRI was generally pessimistic on human-level alignment, but Rob Bensinger said “If we had AGI that were merely as aligned as a human, I think that would immediately eliminate nearly all of the world’s existential risk.” in this comment which was an update for me.

So as a result I tend to see brain reverse engineering as much higher priority than it otherwise would deserve, for both inspiring artificial empathy/altruism and also shortening the timeframe until uploading.
- Steven Byrnes 16 Dec 2021 4:02 UTC
  2 points
  Parent
  I tend to see brain reverse engineering as much higher priority than it otherwise would deserve, for both inspiring artificial empathy/altruism and also shortening the timeframe until uploading
  My take is that the neocortex (and other bits) are running a quasi-general-purpose learning algorithm, and the hypothalamus and brainstem are “steering” that learning algorithm by sending multiple reward signals and other supervisory signals. (The latter are also doing lots of other species-specific instinct stuff that don’t interact with the learning algorithms, like regulating heart rate.)
  So if we reverse-engineer the neocortex learning algorithm first, before learning anything new about the hypothalamus & brainstem, I think that we’d wind up with a recipe for making an AGI with radically alien motivations, but we still wouldn’t know how to make an AGI with human-like empathy / altruism.
  I think there’s circuitry somewhere in the hypothalamus & brainstem that works in conjunction with the learning algorithms to create social instincts, and I’m strongly in favor of figuring out how those circuits work, and that’s one of the things that I’m trying to do myself. :-)
  - jacob_cannell 16 Dec 2021 6:49 UTC
    5 points
    Parent
    Yes, I concur. The cortex seems to get 90% or more of the attention in neuroscience, but those smaller more ancient central brain structures probably have more of the innate complexity relevant for the learning machinery. That’s on my reading list (along with some of your brain articles a friend recommended).