Ah, crux: I do think the floating-point matters! Issues of precision, underflow, overflow, and NaNs bedevil model training and occasionally deployment-time behavior. By analogy, if we deploy an AGI the ideal mathematical form of which is aligned we may still be doomed, even it’s plausibly our best option in expectation.
Checkable meaning that I or someone I trust with this has to be able to check it! Maxwell’s proposal is simple enough that I can reason through the whole thing, even over float32 rather than R, but for more complex arguments I’d probably want it machine-checkable for at least the tricky numeric parts.
Ah, crux: I do think the floating-point matters! Issues of precision, underflow, overflow, and NaNs bedevil model training and occasionally deployment-time behavior. By analogy, if we deploy an AGI the ideal mathematical form of which is aligned we may still be doomed, even it’s plausibly our best option in expectation.
Checkable meaning that I or someone I trust with this has to be able to check it! Maxwell’s proposal is simple enough that I can reason through the whole thing, even over
float32
rather than R, but for more complex arguments I’d probably want it machine-checkable for at least the tricky numeric parts.