Steven Byrnes comments on How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Steven Byrnes 29 Oct 2024 2:43 UTC
69 points
37
Against hard barriers of this kind, you can point to arguments like “positing hard barriers of this kind requires saying that there are some very small differences in intelligence that make the crucial difference between being able vs. unable to do the task in principle. Otherwise, e.g., if a sufficient number of IQ 100 agents with sufficient time can do anything that an IQ 101 agent can do, and a sufficient number of IQ 101 agents with sufficient time can do anything an IQ 102 agent can do, etc, then by transitivity you end up saying that a sufficient number of IQ 100 agents with sufficient time can do anything an IQ 1000 agent can do. So to block this sort of transition, there needs to be at least one specific point where the relevant transition gets blocked, such that e.g. there is something that an IQ X agent can do that no number of IQ X-minus-epsilon agent cannot. And can epsilon really make that much of a difference?”
Here’s an analogy, maybe. A sufficient number of 4yo’s could pick up any weight that a 5yo could pick up; a sufficient number of 5yo’s could pick up any weight that a 6yo could pick up … a sufficient number of national-champion weightlifters could pick up any weight that a world-record weightlifter could pick up.
So does it follow that a sufficient number of 4yo’s can pick up any weight that a world-record weightlifter could pick up? No! The problem is, the weight isn’t very big. So you can’t get a group of 50 4yo’s to simultaneously contribute to picking it up. There’s just no room for them to all hold onto it.
So here’s a model. There are nonzero returns to more agents working together to do a task, if they can all be usefully employed. But there are also rapidly-increasing coordination costs, and/or limitations to one’s ability to split a task into subtasks.
In the human world, you can’t notice a connection between two aspects of a problem unless those two aspects are simultaneously in a single person’s head. Thus, for hard problems, you can split them up a bit, with skill and luck, but not too much, and it generally requires that the people working on the subproblems have heavily-overlapping understandings of what’s going on (or that the manager who split up the problem in the first place has a really solid understanding of both subproblems such that they can be confident that it’s a clean split). See also: interfaces as scarce resources.
What links here?
- Steven Byrnes's comment on JargonBot Beta Test by Raemon (1 Nov 2024 20:51 UTC; 12 points)
- ryan_greenblatt 30 Oct 2024 0:46 UTC
  8 points
  2
  Parent
  Joe’s argument here would actually be locally valid if we changed:
  
  a sufficient number of IQ 100 agents with sufficient time can do anything that an IQ 101 agent can do
  
  to:
  
  a sufficient number of IQ 100 agents with sufficient time can do anything that some number of IQ 101 agents can do eventually
  
  We can see why this works when applied to your analogy. If we change:
  
  A sufficient number of 4yo’s could pick up any weight that a 5yo could pick up
  
  to
  
  A sufficient number of 4yo’s could pick up any weight that some number of 5yo’s could pick up
  
  Then we can see where the issue comes in. The problem is that while a team of 4yo’s can always beat a single 5yo, there exists some number of 5yo’s which can beat any number of 4yo’s.
  
  If we fix the local validity issue in Joe’s argument like this, it is easier to see where issues might crop up.
- Noosphere89 29 Oct 2024 18:52 UTC
  5 points
  0
  Parent
  The memory and coordination issues are a reason why this sort of approach doesn’t work as well for humans, but in the context of Carlsmith’s post, AIs probably have much lower coordination costs and can be assumed to have similar memories/memory capacities (At least for the given training run.).
  And here, a very useful empirical rule is that there’s an exponential drop-off as you get less intelligent, but there aren’t hard intelligence barriers to learning something, just less and less chance to learn something within a given time period.
  This is what motivates some hope for the scalable oversight agenda: As long as you keep the capability gain small enough or you have enough trusted AI labor (probably both in reality), you can use trusted dumb labor to create a smarter AI system in limited quantities that is also trusted, then copy the new smarter AI to research the alignment problem, and repeat.
  So unlike the amount of 4 year olds vs weightlifting case, where the 4 year olds have the issue of both not being as intelligent as adults, and also not being able to coordinate, here we only have the intelligence issue, and we know pretty well how far this scales in many scientific fields, where intelligence basically is always good, but there are no hard cutoffs/hard barriers to doing well (modulo memory capacity issues, which is a problem but also we can usually assume that time, not memory is the bottleneck to alignment via scalable oversight, especially in an intelligence explosion).
  See these quotes from Carl Shulman here:
  Yeah. In science the association with things like scientific output, prizes, things like that, there’s a strong correlation and it seems like an exponential effect. It’s not a binary drop-off. There would be levels at which people cannot learn the relevant fields, they can’t keep the skills in mind faster than they forget them. It’s not a divide where there’s Einstein and the group that is 10 times as populous as that just can’t do it. Or the group that’s 100 times as populous as that suddenly can’t do it. The ability to do the things earlier with less evidence and such falls off at a faster rate in Mathematics and theoretical Physics and such than in most fields.
  Yes, people would have discovered general relativity just from the overwhelming data and other people would have done it after Einstein.
  No, that intuition is not necessarily correct. Machine learning certainly is an area that rewards ability but it’s also a field where empirics and engineering have been enormously influential. If you’re drawing the correlations compared to theoretical physics and pure mathematics, I think you’ll find a lower correlation with cognitive ability. Creating neural lie detectors that work involves generating hypotheses about new ways to do it and new ways to try and train AI systems to successfully classify the cases. The processes of generating the data sets of creating AIs doing their best to put forward truths versus falsehoods, to put forward software that is legit versus that has a trojan in it are experimental paradigms and in these experimental paradigms you can try different things that work. You can use different ways to generate hypotheses and you can follow an incremental experimental path. We’re less able to do that in the case of alignment and superintelligence because we’re considering having to do things on a very short timeline and it’s a case where really big failures are irrecoverable. If the AI starts rooting the servers and subverting the methods that we would use to keep it in check we may not be able to recover from that. We’re then less able to do the experimental procedures. But we can still do those in the weaker contexts where an error is less likely to be irrecoverable and then try and generalize and expand and build on that forward.
  Here’s the link to all these quotes below:
  https://www.lesswrong.com/posts/BdPjLDG3PBjZLd5QY/carl-shulman-on-dwarkesh-podcast-june-2023#Can_we_detect_deception_
  So I expect coordination, and to a lesser extent interfaces to be a slack constraint for AIs by default (at least without AI control measures), compared to humans.