It’s a weird (and I suspect ineffective or counterproductive) limit to be sure, but the underlying idea of having somewhat arbitrary human-defined limits and being able to study how they work and don’t work seems incredibly valuable to AI safety.
It’s a weird (and I suspect ineffective or counterproductive) limit to be sure, but the underlying idea of having somewhat arbitrary human-defined limits and being able to study how they work and don’t work seems incredibly valuable to AI safety.