I’ll respond to this more later, but first I’d like to clarify a narrow point about when building superintelligence is acceptable.
Claim 4: If we succeed in controlling “transformatively useful AI”, then we may be able to stop the race toward superintelligence and get AI labs or governments to agree to not build superintelligence until it can be controlled (not intuitive; non-technical arguments needed).
I think that purely black-box AI control is very unlikely to scale to superintelligence, but there are other plausible routes to safety (alignment, various approaches which make usage of the internals of models but which aren’t well described as alignment).
So, the condition is more like “when we can be confident in the safety of superintelligence” and there are a variety of routes here. Separately, if we don’t have an absolutely air tight argument for safety, we would ideally delay building superintelligence if it wouldn’t be needed to solve pressing problems. (And indeed, we don’t think that superintelligence to solve a wide range of problems.)
What do you think about marginal superintelligences?
For example, take the task “implement an x86 CPU as gate masks”. Humans can do this task, but no single human can do the task, and humans are forced to inefficiently subdivide the task. For example a “CPU” that did not have distinct internal buses but was some blob of gates, with the registers and cache right in the middle of the logic mess (or with cache lines descending from dies soldered above), would probably outperform all current designs.
This hypothetical mess of a chip design is not something humans can create but it’s a checkable artifact.
Or another task, “construct a new human kidney from cells. All measurable parameters must meet or exceed a reference kidney”. Similar argument—humans can’t quite do this, the complexity of life support during construction is where humans would fail, or human made designs wouldn’t quite work well enough.
But again this is a checkable artifact. You don’t need superintelligence to validate that the output satisfies (or fails to satisfy) the goal.
A marginal superintelligence would be one that is context unaware and gets assigned tasks like this. It doesn’t know when the task is real or not.
I’ll respond to this more later, but first I’d like to clarify a narrow point about when building superintelligence is acceptable.
I think that purely black-box AI control is very unlikely to scale to superintelligence, but there are other plausible routes to safety (alignment, various approaches which make usage of the internals of models but which aren’t well described as alignment).
So, the condition is more like “when we can be confident in the safety of superintelligence” and there are a variety of routes here. Separately, if we don’t have an absolutely air tight argument for safety, we would ideally delay building superintelligence if it wouldn’t be needed to solve pressing problems. (And indeed, we don’t think that superintelligence to solve a wide range of problems.)
What do you think about marginal superintelligences?
For example, take the task “implement an x86 CPU as gate masks”. Humans can do this task, but no single human can do the task, and humans are forced to inefficiently subdivide the task. For example a “CPU” that did not have distinct internal buses but was some blob of gates, with the registers and cache right in the middle of the logic mess (or with cache lines descending from dies soldered above), would probably outperform all current designs.
This hypothetical mess of a chip design is not something humans can create but it’s a checkable artifact.
Or another task, “construct a new human kidney from cells. All measurable parameters must meet or exceed a reference kidney”. Similar argument—humans can’t quite do this, the complexity of life support during construction is where humans would fail, or human made designs wouldn’t quite work well enough.
But again this is a checkable artifact. You don’t need superintelligence to validate that the output satisfies (or fails to satisfy) the goal.
A marginal superintelligence would be one that is context unaware and gets assigned tasks like this. It doesn’t know when the task is real or not.