I get exactly what he means, but I suspect that a lot of people are not able to decompress and unroll that into something they “grook” on a fundamental level.
Something like “superintelligence without knowledge about itself and never reason about itself, without this leading to other consequences that would make it incoherent” would cut out a ton of lethality, and combine that with giving such a thing zero agency in the world, you might actually have something that could do “things we want, but don’t know how to do” without it ending us on the first critical try.
Doable in principle, but such measures would necessarily cut into the potential capabilities of such a system.
So basically a trade off, and IMO very worth it.
The problem is we are not doing it, and more basic, people generally do not get why it is important. Maybe its the framing, like when EY goes “superintelligence that firmly believes 222+222=555 without this leading to other consequences that would make it incoherent”.
I get exactly what he means, but I suspect that a lot of people are not able to decompress and unroll that into something they “grook” on a fundamental level.
Something like “superintelligence without knowledge about itself and never reason about itself, without this leading to other consequences that would make it incoherent” would cut out a ton of lethality, and combine that with giving such a thing zero agency in the world, you might actually have something that could do “things we want, but don’t know how to do” without it ending us on the first critical try.