Isn’t this similar to a Godzilla Strategy? (One AI overseeing the other.)
That variants of this approach are of use to superintelligent AI safety: 40%.
Do you have some more detailed reasoning behind such massive confidence? If yes, it would probably be worth its own post.
This seems like a cute idea that might make current LLM prompt filtering a little less circumventable, but I don’t see any arguments for why this would scale to superintelligent AI. Am I missing something?
Isn’t this similar to a Godzilla Strategy? (One AI overseeing the other.)
Do you have some more detailed reasoning behind such massive confidence? If yes, it would probably be worth its own post.
This seems like a cute idea that might make current LLM prompt filtering a little less circumventable, but I don’t see any arguments for why this would scale to superintelligent AI. Am I missing something?