I’m usually a skeptic of the usefulness of this kind of speculation but I found this a good read.
I am particularly intrigued hy the suggestion of decomposability of goals.
Thanks! Yeah, I personally find it difficult to strike a balance between “most speculation of this kind is useless” and “occasionally it’s incredibly high-impact” (e.g. shapes the whole field of alignment, like the concept of deceptive alignment did).
My guess is that the work which falls into the latter category is most often threat modeling, because that’s a domain where there’s no way to approach it except speculation.
I’m usually a skeptic of the usefulness of this kind of speculation but I found this a good read. I am particularly intrigued hy the suggestion of decomposability of goals.
Thanks! Yeah, I personally find it difficult to strike a balance between “most speculation of this kind is useless” and “occasionally it’s incredibly high-impact” (e.g. shapes the whole field of alignment, like the concept of deceptive alignment did).
My guess is that the work which falls into the latter category is most often threat modeling, because that’s a domain where there’s no way to approach it except speculation.