Seth Herd comments on Slowed ASI—a possible technical strategy for alignment

Seth Herd 14 Jun 2024 22:45 UTC
2 points
0
Thanks for the contribution. I think it’s really important to keep brainstorming new alignment/safety approaches. We haven’t had enough different people working on alignment long enough to know that we’ve thought of every idea. This one, for instance, is completely new to me.

I put this in the class of ideas “start over on AGI and take a safer approach”. I tend to think there’s little chance that these ideas will be useful in our world, where progress toward AGI is going fast enough that it would be quite difficult to stop. But who knows.

It strikes me that it would be easier to slow down running AI on standard computers. I guess part of the idea is that you’d make it impossible for the AI to proliferate to other hardware, or bump up its run-rate. It could ultimately write or create a version of itself that could run on tradional hardware, but that would be a large roadblock to takeover.
- Lester Leong 17 Jun 2024 23:31 UTC
  1 point
  0
  Parent
  I do concede that in theory it should be possible to do this on standard computers, and that it might actually be a good way to test this hypothesis out and gather empirical data today.
  Where I’m not so sure is whether even “slow” standard computers think slower than humans do. In other words, imagine some future AI architecture that is OOM more power and compute efficient. It may not be entirely unreasonable to suppose that we could get an algorithmic/architectural innovation that would enable GPT-4 level performance on old legacy hardware (eg, from the 80s). Indeed, at the unit level, we have calculators from the 60s that can out-multiply the fastest mental mathematicians today.
  I thought of using virtualization/simulation to emulate a slow sandbox environment for the AI to run in, but as you mention, it does lower the barrier for an “escape”, creating risk that might be unacceptable when dealing with ASI.
  With that said, being able to simulate this on current hardware does suggest a possible way to test this whole hypothesis out today. For example, we could have two different models: Model A, with higher parameter count and ELO score running on a slower environment, and Model B, with less parameters/lower ELO score running on a faster environment, and see if Model B is able to sufficiently supervise Model A on some task that would be difficult (or impossible) if both were running at the same speed.