Someone anonymously suggests:
Keep pretraining closely matching the human text distributionRequire all rumination to be done in natural languageRequire approval before taking consequential actionsOnly deploy a model with a second adversarial monitorOnly deploy a model if it’s “dumb” in a quantifiable way
Keep pretraining closely matching the human text distribution
Require all rumination to be done in natural language
Require approval before taking consequential actions
Only deploy a model with a second adversarial monitor
Only deploy a model if it’s “dumb” in a quantifiable way
Someone anonymously suggests: