In the section about systemic risk, you should have referred to Moloch, which has been discussed recently in the context of AI risk by Boeree and Schmachtenberger, Forrest Landry (1, 2), among many others, and can be vaguely associated with Cristiano’s “first scenario of AI failure”. Moloch has a tag on LW where something along these lines is also discussed.
Re: “destructive cult/malign actor risk”—I absolutely agree.
The usual answer to this by folks like at OpenAI or Anthropic is that at first, RLHF or other techniques will help to prevent malign actors from applying A(G)I towards destructive ends, and then we will also ask the AI to help us solve this problem in a fundamental way (or, indeed, prove to them that the policy proposal by Yudkowsky, “shut down all AGI and only focus on narrow biomedical AI”, is the safest thing to do).
That’s why the push by various people and groups for open-sourcing AI is suicide. Especially open-sourcing checkpoints and non-RLHF’ed versions because this open-source AI could easily be used by malign actors to develop weapons and destructive plans.
The typical counter-argument by these “open-source maximalists” is some handwaving like “if everyone has a powerful AI in their pocket no single group could really harm the world much, the attack-defence balance will be approximately at the same point where it is now”. This argument is very bad because it’s countered easily, for example, if one AI creates a super-virus (ultra-fast-spreading and ultra-deadly), another AI, even a superintelligent one couldn’t make much to stop its spread and prevent the massive pandemics.
Moloch is not, in my opinion, the correct analogy to use for systemic problems.
Moloch absolves humans of guilt for systemic problems by hand waving them away as self interest.
But systemic problems usually stem from things that are both bad when done in aggregate and bad for us as individuals. Hence why my analogy used a button that would give one nothing of value, just something we from our blind position would think to be of value.
I agree with the rest of your comment but it’s trivial to replicate the work by anthropic or openai in a focused direction. It’s already been done.
Their main edge is the RLHF datasets they have but those are good for safety (arguably, ad you point out) and for capabilities in-so-far as they are interacting with humans that haven’t trained to use such systems.
So we do and likely will live in the hardest of worlds, where it’s all open source.
“should have” referred to moloch is much too strong. certainly it’s valid to refer to and it’s a solid connection to make given that that’s a name that has been given to the concept. but I think mentioning it in the comments as a contribution like you did is actually valid and not everyone has to know all the custom words folks use. “folks here have called that moloch” seems fine. strong downvote for this.
I do believe that authors should do the leg work of connecting their frames with other frames made by other people previously themselves, to save disproportionally much more cognitive effort for the readers of connecting concepts in their heads and to prevent misinterpretations. In academia, this is called “citing prior work”. Citing zero prior work is bad style and correctly shunned, a-la Wolfram.
My new comment applies in general—notice that I mentioned “misinterpretations”. If I did this misinterpretation originally it means that probably many other people also did it, and to increase the % of people who interpreted your text correctly you would better have included a paragraph like “Note that this idea is distinct from Moloch, because …”, or “This idea is a spin on some earlier ideas, …”.
I maintain that “readers should read better and decipher and interpret correctly what I’ve written, and if they failed, so it worse for them” is a bad attitude and strategy for academic and philosophical writing (even though it’s widespread in different guises).
Well, I perfectly agree with you then. This is why I’ve never written anything I’d intend to publish in an academic setting nor anything I’d consider to be pure philosophy.
In the section about systemic risk, you should have referred to Moloch, which has been discussed recently in the context of AI risk by Boeree and Schmachtenberger, Forrest Landry (1, 2), among many others, and can be vaguely associated with Cristiano’s “first scenario of AI failure”. Moloch has a tag on LW where something along these lines is also discussed.
Re: “destructive cult/malign actor risk”—I absolutely agree.
The usual answer to this by folks like at OpenAI or Anthropic is that at first, RLHF or other techniques will help to prevent malign actors from applying A(G)I towards destructive ends, and then we will also ask the AI to help us solve this problem in a fundamental way (or, indeed, prove to them that the policy proposal by Yudkowsky, “shut down all AGI and only focus on narrow biomedical AI”, is the safest thing to do).
That’s why the push by various people and groups for open-sourcing AI is suicide. Especially open-sourcing checkpoints and non-RLHF’ed versions because this open-source AI could easily be used by malign actors to develop weapons and destructive plans.
The typical counter-argument by these “open-source maximalists” is some handwaving like “if everyone has a powerful AI in their pocket no single group could really harm the world much, the attack-defence balance will be approximately at the same point where it is now”. This argument is very bad because it’s countered easily, for example, if one AI creates a super-virus (ultra-fast-spreading and ultra-deadly), another AI, even a superintelligent one couldn’t make much to stop its spread and prevent the massive pandemics.
Moloch is not, in my opinion, the correct analogy to use for systemic problems.
Moloch absolves humans of guilt for systemic problems by hand waving them away as self interest.
But systemic problems usually stem from things that are both bad when done in aggregate and bad for us as individuals. Hence why my analogy used a button that would give one nothing of value, just something we from our blind position would think to be of value.
I agree with the rest of your comment but it’s trivial to replicate the work by anthropic or openai in a focused direction. It’s already been done.
Their main edge is the RLHF datasets they have but those are good for safety (arguably, ad you point out) and for capabilities in-so-far as they are interacting with humans that haven’t trained to use such systems.
So we do and likely will live in the hardest of worlds, where it’s all open source.
“should have” referred to moloch is much too strong. certainly it’s valid to refer to and it’s a solid connection to make given that that’s a name that has been given to the concept. but I think mentioning it in the comments as a contribution like you did is actually valid and not everyone has to know all the custom words folks use. “folks here have called that moloch” seems fine. strong downvote for this.
I do believe that authors should do the leg work of connecting their frames with other frames made by other people previously themselves, to save disproportionally much more cognitive effort for the readers of connecting concepts in their heads and to prevent misinterpretations. In academia, this is called “citing prior work”. Citing zero prior work is bad style and correctly shunned, a-la Wolfram.
See what I previously wrote, in my opinion you should make an effort to read rather than pattern match to existing concepts.
My new comment applies in general—notice that I mentioned “misinterpretations”. If I did this misinterpretation originally it means that probably many other people also did it, and to increase the % of people who interpreted your text correctly you would better have included a paragraph like “Note that this idea is distinct from Moloch, because …”, or “This idea is a spin on some earlier ideas, …”.
I maintain that “readers should read better and decipher and interpret correctly what I’ve written, and if they failed, so it worse for them” is a bad attitude and strategy for academic and philosophical writing (even though it’s widespread in different guises).
Well, I perfectly agree with you then. This is why I’ve never written anything I’d intend to publish in an academic setting nor anything I’d consider to be pure philosophy.