James Stephen Brown comments on The Alignment Problem No One Is Talking About

James Stephen Brown 17 May 2024 13:12 UTC
1 point
0
What an insightful post!
I have difficulty judging how likely that is, but the odds will improve if semi-wise humans keep getting input from their increasingly wise AGIs.
I think we’re on the same page here, positing that AGI could actually help to improve alignment—if we give it that task. I really like one of your fundamental instructions being to ask about potential issues with alignment.
And on the topic of dishing out tasks, I agree that pushing the industry toward Instruction Following is an ideal path, and I think there will be a great deal of consumer demand for this sort of product. A friend of mine has mentioned this as the no-brainer approach to AI safety and even a reason what AI safety isn’t actually that big a deal… I realise you’re not making this claim in the same way.
My concern regarding this is that the industry is ultimately going to follow demand and as AI becomes more multi-faceted and capable, the market for digital companions, assistants and creative partners will incentivise the production of more human, more self-motivated agents (sovereign AGI) that generate ideas, art and conversation autonomously, even spontaneously.
Some will want a two-way partnership, rather than master-slave. This market will incentivise more self-training, self-play, even an analogue to dreaming / day-dreaming (all without a HITL). Whatever company enables this process for AI will gain market share in these areas. So, while Instruction Following AI will be safe, it won’t necessarily satisfy consumer demand in the way that a more self-motivated and therefore less-corrigible AI would.

But I agree with you that moving forward in a piecemeal fashion with the control of an IF and DWIMAC approach gives us the best opportunity to learn and adapt. The concern about sovereign AGI probably needs to be addressed through governance (enforcing HITL, enforcing a controlled pace of development, and being vigilant about the run-away potential of self-motivated agents) but it does also bring Value Alignment back into the picture. I think you do a great job of outlining how ideal an IF development path is, which should make everyone suspicious if development starts moving in a different direction.

Do you think it will be possible to create an AGI that is fundamentally Instruction Following that could satisfy the market for the human-like interaction some of the market will demand?

I apologise if you’ve, in some way I’ve not recognised, already addressed this question, there were a lot of very interesting links in your post, not all of which I could be entirely sure I grokked adequately.

Thanks for your comments, I look forward to reading more of your work.