My issue with the security mindset is that there’s a selection effect/bias that causes people to notice the failures of security, and not it’s successes, even if the true evidence for success is massively larger than it’s failure.
Here’s a quote from lc’s post POC or GTFO as a counter to alignment wordcelism, on why the security industry has massive issues with people claiming security failures when they don’t or can’t happen:
Even if you’re right that an attack vector is unimportant and probably won’t lead to any real world consequences, in retrospect your position will be considered obvious. On the other hand, if you say that an attack vector is important, and you’re wrong, people will also forget about that in three years. So better list everything that could possibly go wrong[1], even if certain mishaps are much more likely than others, and collect oracle points when half of your failure scenarios are proven correct.
And this is why in general I dislike the security mindset, because of the incentives to report failure or bad events even when they aren’t very much of a concern.
Also, the stuff that computer security people do largely doesn’t need to be done in ML/AI, which is another reason I’m skeptical of the security mindset.
They do matter, since it implies a sort of selection effect where people will share the evidence for doom, and not notice the evidence for not-doom, and this matters because the real chance of doom may be much lower, in principle arbitrarily low, while LWers and AI safety/governance organizations have higher probabilities of doom.
Combined with more standard biases on negative news being selected for, it is one piece in why I think AI doom is very unlikely. This is just one piece of it, not my entire argument
And I think this already happened, cf the entire inner misalignment/optimization daemon situation, where it was tested twice, once showing a confirmed break, and the other one by Ulisse Mini, where in a more realistic situation, the optimization daemon/inner misalignment went away, and very little shared on this result, compared to the original which almost certainly got more views.
My issue with the security mindset is that there’s a selection effect/bias that causes people to notice the failures of security, and not it’s successes, even if the true evidence for success is massively larger than it’s failure.
Here’s a quote from lc’s post POC or GTFO as a counter to alignment wordcelism, on why the security industry has massive issues with people claiming security failures when they don’t or can’t happen:
And this is why in general I dislike the security mindset, because of the incentives to report failure or bad events even when they aren’t very much of a concern.
Also, the stuff that computer security people do largely doesn’t need to be done in ML/AI, which is another reason I’m skeptical of the security mindset.
These are parochial matters within the computer security community, and do not bear on the hazards of AGI.
They do matter, since it implies a sort of selection effect where people will share the evidence for doom, and not notice the evidence for not-doom, and this matters because the real chance of doom may be much lower, in principle arbitrarily low, while LWers and AI safety/governance organizations have higher probabilities of doom.
Combined with more standard biases on negative news being selected for, it is one piece in why I think AI doom is very unlikely. This is just one piece of it, not my entire argument
And I think this already happened, cf the entire inner misalignment/optimization daemon situation, where it was tested twice, once showing a confirmed break, and the other one by Ulisse Mini, where in a more realistic situation, the optimization daemon/inner misalignment went away, and very little shared on this result, compared to the original which almost certainly got more views.