This agenda is primarily useful in the scenario that (1) human intelligence is mostly powered by legible learning algorithms in the brain (especially neocortex), (2) people will eventually reverse-engineer or reinvent these learning algorithms and thus build HLMI (3) …before anyone builds HLMI by any other different path.
The agenda aims to increase the chance that high-level machine intelligence (HLMI) is aligned.
I think there are many possible paths from “we don’t know how to reliably control the motivations of a HLMI” to “existential catastrophe”, and I don’t have to take a stand on which ones are more or less likely.
The specific outcome I’m aiming for is either (A) an architecture plan / training plan / whatever that enables programmers to reliably set the HLMI’s motivation to be and stay in a particular intended direction, or to align with human ethics, or whatever, e.g. avoiding failure modes like value drift, wireheading, conscious suffering AIs, etc., or (B) establish that no such architecture is possible, in which case we can at least discourage this line of capabilities research or encourage alternatives (i.e. differential technology development).
If promising architectures are found, I mean, obviously I hope they’ll be competitive with unaligned architectures, but it’s too soon to know.
With regard to timelines, the utility of this agenda depends mostly on the extent to which the HLMI development path goes to the destination of neocortex-like algorithms. The timeline per se—i.e., how soon researchers reach that destination—is less important.
I think of this as a cousin of Prosaic AGI research. Prosaic AGI research says “what if AGI is like the most impressive ML algorithms of today?”. I say “what if AGI is like the neocortex?” I think both agendas are valuable, like for contingency planning purposes. Obviously I have my own opinions about the relative probabilities of those two contingencies (and it could also be “neither of the above”), but I’m not sure that’s very decision-relevant, we should just do both. :-P
Nice! A couple things that this comment pointed out for me:
Real time is not always (and perhaps often not) the most useful way to talk about timelines. It can be more useful to talk about different paths, or economic growth, if that’s more relevant to how tractable the research is.
An agenda doesn’t necessarily have to argue that its assumptions are more likely, because we may have enough resources to get worthwhile expected returns on multiple approaches.
Something that’s unclear here: are you excited about this approach because you think brain-like AGI will be easier to align? Or is it more about the relative probabilities / neglectedness / your fit?
are you excited about this approach because you think brain-like AGI will be easier to align?
I don’t think it’s obvious that “we should do extra safety research that bet on a future wherein AGI safety winds up being easy”. If anything it seems backwards. Well, tractability cuts one way, importance cuts the other way, “informing what we should do viz. differential technology development” is a bit unclear. I do know one person who works on brain-like AGI capabilities on the theory that brain-like AGI would be easier to align. Not endorsing that, but at least there’s an internal logic there.
(FWIW, my hunch is that brain-like AGI would be better / less bad for safety than the “risks from learned optimization” scenario, albeit with low confidence. How brain-like AGI compares to other scenarios (GPT-N or whatever), I dunno.)
Instead I’m motivated to work on this because of relative probabilities and neglectedness.
I like this comment, though I don’t have a clear-eyed view of what sort of research makes (A) or (B) more likely. Is there a concrete agenda here (either that you could link to, or in your head), or is the work more in the exploratory phase?
You could read all my posts, but maybe a better bet is to wait a month or two, I’m in the middle of compiling everything into a (hopefully) nice series of blog posts that lays out everything I know so far.
I don’t really know how to do (B) except “keep trying to do (A), and failing, and maybe the blockers will become more apparent”.
Sure, sounds fun, here goes:
Brain-like AGI safety
This agenda is primarily useful in the scenario that (1) human intelligence is mostly powered by legible learning algorithms in the brain (especially neocortex), (2) people will eventually reverse-engineer or reinvent these learning algorithms and thus build HLMI (3) …before anyone builds HLMI by any other different path.
The agenda aims to increase the chance that high-level machine intelligence (HLMI) is aligned.
I think there are many possible paths from “we don’t know how to reliably control the motivations of a HLMI” to “existential catastrophe”, and I don’t have to take a stand on which ones are more or less likely.
The specific outcome I’m aiming for is either (A) an architecture plan / training plan / whatever that enables programmers to reliably set the HLMI’s motivation to be and stay in a particular intended direction, or to align with human ethics, or whatever, e.g. avoiding failure modes like value drift, wireheading, conscious suffering AIs, etc., or (B) establish that no such architecture is possible, in which case we can at least discourage this line of capabilities research or encourage alternatives (i.e. differential technology development).
If promising architectures are found, I mean, obviously I hope they’ll be competitive with unaligned architectures, but it’s too soon to know.
With regard to timelines, the utility of this agenda depends mostly on the extent to which the HLMI development path goes to the destination of neocortex-like algorithms. The timeline per se—i.e., how soon researchers reach that destination—is less important.
I think of this as a cousin of Prosaic AGI research. Prosaic AGI research says “what if AGI is like the most impressive ML algorithms of today?”. I say “what if AGI is like the neocortex?” I think both agendas are valuable, like for contingency planning purposes. Obviously I have my own opinions about the relative probabilities of those two contingencies (and it could also be “neither of the above”), but I’m not sure that’s very decision-relevant, we should just do both. :-P
Nice! A couple things that this comment pointed out for me:
Real time is not always (and perhaps often not) the most useful way to talk about timelines. It can be more useful to talk about different paths, or economic growth, if that’s more relevant to how tractable the research is.
An agenda doesn’t necessarily have to argue that its assumptions are more likely, because we may have enough resources to get worthwhile expected returns on multiple approaches.
Something that’s unclear here: are you excited about this approach because you think brain-like AGI will be easier to align? Or is it more about the relative probabilities / neglectedness / your fit?
I don’t think it’s obvious that “we should do extra safety research that bet on a future wherein AGI safety winds up being easy”. If anything it seems backwards. Well, tractability cuts one way, importance cuts the other way, “informing what we should do viz. differential technology development” is a bit unclear. I do know one person who works on brain-like AGI capabilities on the theory that brain-like AGI would be easier to align. Not endorsing that, but at least there’s an internal logic there.
(FWIW, my hunch is that brain-like AGI would be better / less bad for safety than the “risks from learned optimization” scenario, albeit with low confidence. How brain-like AGI compares to other scenarios (GPT-N or whatever), I dunno.)
Instead I’m motivated to work on this because of relative probabilities and neglectedness.
I like this comment, though I don’t have a clear-eyed view of what sort of research makes (A) or (B) more likely. Is there a concrete agenda here (either that you could link to, or in your head), or is the work more in the exploratory phase?
You could read all my posts, but maybe a better bet is to wait a month or two, I’m in the middle of compiling everything into a (hopefully) nice series of blog posts that lays out everything I know so far.
I don’t really know how to do (B) except “keep trying to do (A), and failing, and maybe the blockers will become more apparent”.