Noosphere89 comments on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.)

Noosphere89 18 Oct 2023 17:09 UTC
0 points
−2
My issue with the security mindset is that there’s a selection effect/bias that causes people to notice the failures of security, and not it’s successes, even if the true evidence for success is massively larger than it’s failure.

Here’s a quote from lc’s post POC or GTFO as a counter to alignment wordcelism, on why the security industry has massive issues with people claiming security failures when they don’t or can’t happen:

Even if you’re right that an attack vector is unimportant and probably won’t lead to any real world consequences, in retrospect your position will be considered obvious. On the other hand, if you say that an attack vector is important, and you’re wrong, people will also forget about that in three years. So better list everything that could possibly go wrong[1], even if certain mishaps are much more likely than others, and collect oracle points when half of your failure scenarios are proven correct.

And this is why in general I dislike the security mindset, because of the incentives to report failure or bad events even when they aren’t very much of a concern.

Also, the stuff that computer security people do largely doesn’t need to be done in ML/AI, which is another reason I’m skeptical of the security mindset.
- Richard_Kennaway 19 Oct 2023 7:27 UTC
  0 points
  −4
  Parent
  These are parochial matters within the computer security community, and do not bear on the hazards of AGI.
  - Noosphere89 19 Oct 2023 15:31 UTC
    0 points
    −2
    Parent
    They do matter, since it implies a sort of selection effect where people will share the evidence for doom, and not notice the evidence for not-doom, and this matters because the real chance of doom may be much lower, in principle arbitrarily low, while LWers and AI safety/governance organizations have higher probabilities of doom.
    
    Combined with more standard biases on negative news being selected for, it is one piece in why I think AI doom is very unlikely. This is just one piece of it, not my entire argument
    
    And I think this already happened, cf the entire inner misalignment/optimization daemon situation, where it was tested twice, once showing a confirmed break, and the other one by Ulisse Mini, where in a more realistic situation, the optimization daemon/inner misalignment went away, and very little shared on this result, compared to the original which almost certainly got more views.