Martin Vlach comments on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Martin Vlach 8 Nov 2024 5:57 UTC
1 point
0
Wow, such a badly argued( aka BS) while heavily up-voted article!
Let’s start with the Myth #1, what a straw-man! Rather than this extreme statement, most researchers likely believe that in the current environment their safety&alignment advances are likely( with high EV) helpful to humanity. The thing here is they had quite a free hand or at least varied options to pick the environment where they work and publish.
With your examples a bad actor could see a worthy EV even with a capable system that is less obedient and more false. Even if interpretabilty speeds up development, it would direct such development to more transparent models, at least there is a naive chance for that.

Myth #2: I’ve not yet met anybody in the alighnment circles who believed that. Most are pretty conscious about the double-edgedness and your sub-arguments.

https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the?commentId=5vB5tDpFiQDG4pqqz depicts the flaws I point to neatly/gently.