In order to understand superintelligence, we should first characterise what we mean by intelligence. Legg’s well-known definition identifies intelligence as the ability to do well on a broad range of cognitive tasks.[1] However, this combines two attributes which I want to keep separate for the purposes of this report: the ability to understand how to perform a task, and the motivation to actually apply that ability to do well at the task. So I’ll define intelligence as the former, which is more in line with common usage, and discuss the latter in the next section.
I like this split into two components, mostly because it fits with my intuition that goal-directedness (what I assume the second component is) is separated from competence in principle. Looking only at the behavior, there’s probably a minimal level of competence necessary to detect goal-directedness. But if I remember correctly, you defend a definition of goal-directedness that also depends on the internal structure, so that might not be an issue here.
Because of the ease and usefulness of duplicating an AGI, I think that collective AGIs should be our default expectation for how superintelligence will be deployed.
I am okay with assuming collective AGIs instead of single AGIs, but what does it change in terms of technical AI Safety?
Even a superintelligent AGI would have a hard time significantly improving its cognition by modifying its neural weights directly; it seems analogous to making a human more intelligent via brain surgery (albeit with much more precise tools than we have today)
Although I agree with your general point that self-modification will probably come out of self-retraining, I don’t think I agree with the paragraph quoted. The main difference I see is that an AI built from… let say neural networks, has access to exactly every neurons making itself. It might not be able to study all of them at once, but that’s still a big difference from measures in a functioning brain which are far less precise AFAIK. I think this entails that before AGI, ML researchers will go farther into understanding how NN works than neuroscientist will go for brains, and after AGI arrives the AI can take the lead.
So it’s probably more accurate to think about self-modification as the process of an AGI modifying its high-level architecture or training regime, then putting itself through significantly more training. This is very similar to how we create new AIs today, except with humans playing a much smaller role.
It also feels very similar to how humans systemically improve at something: make a study or practice plan, and then train according to it.
Nice post!
I like this split into two components, mostly because it fits with my intuition that goal-directedness (what I assume the second component is) is separated from competence in principle. Looking only at the behavior, there’s probably a minimal level of competence necessary to detect goal-directedness. But if I remember correctly, you defend a definition of goal-directedness that also depends on the internal structure, so that might not be an issue here.
I am okay with assuming collective AGIs instead of single AGIs, but what does it change in terms of technical AI Safety?
Although I agree with your general point that self-modification will probably come out of self-retraining, I don’t think I agree with the paragraph quoted. The main difference I see is that an AI built from… let say neural networks, has access to exactly every neurons making itself. It might not be able to study all of them at once, but that’s still a big difference from measures in a functioning brain which are far less precise AFAIK. I think this entails that before AGI, ML researchers will go farther into understanding how NN works than neuroscientist will go for brains, and after AGI arrives the AI can take the lead.
It also feels very similar to how humans systemically improve at something: make a study or practice plan, and then train according to it.