Steven Byrnes comments on Open & Welcome Thread November 2021

Steven Byrnes 22 Nov 2021 13:46 UTC
5 points
how his arguments would be responded today
The old rebuttals I’m familiar with are Gwern’s and Eliezer’s and Luke’s. Newer responses might also include things like Richard Ngo’s AGI safety from first principles or Joe Carlsmith’s report on power-seeking AIs. (Risk is disjunctive; there are a lot of different ways that reality could turn out worse than Holden-2012 expected.) Obviously Holden himself changed his mind; I vaguely recall that he wrote something about why, but I can’t immediately find it.
Holden Karnofsky looks active here on LW,
I’m not sure that’s accurate. His blog posts are getting cross-posting from his account, but that could also be the work of an LW administrator (with his permission).
- gwern 22 Nov 2021 16:23 UTC
  21 points
  Parent
  I think my rebuttal still basically stands, and my predictions like about how the many promises about how autonomous drones would never be fully autonomous would collapse within years have been borne out. We apparently may have fully autonomous drones killing people now in Libya, and the US DoD has walked back its promises about how humans would always authorize actions and now merely wants some principles like being ‘equitable’ or ‘traceable’. (How very comforting. I’m glad we’re building equity in our murderbots.) I’d be lying if I said I was even a little surprised that the promises didn’t last a decade before collapsing under the pressures that make tool AIs want to be agent AIs.
  
  I don’t think too many people are still going around saying “ah, but what if we simply didn’t let the AIs do things, just like we never let them do things with drones? problem solved!” so these days, I would emphasize more what we’ve learned about the very slippery and unprincipled line between tool AIs and agent AIs due to scaling and self-supervised learning, given GPT-3 etc. Agency increasingly looks like Turing-completeness or weird machines or vulnerable insecure software: the default, and difficult to keep from leaking into any system of interesting intelligence or capabilities, and not something special that needs to be hand-engineered in and which can be assumed to be absent if you didn’t work hard at it.