Excellent! I think that’s a clear and compelling description of the AI alignment problem, particularly in combination with your cartoon images. I think this is worth sharing as an easy intro to the concept.
I’m curious—how did you produce the wonderful images? I can draw a little, and I’d like to be able to illustrate like you did here, whether that involves AI or some other process.
FWIW, I agree that understanding humanity’s alignment challenges is conceptually an extension of the AI alignment problem. But I think it’s commonly termed “coordination” in LW discourse, if you want to see what people have written about that problem here. Moloch is the other term of art for thorny coordination/competition problems.
Thanks for your kind words. It’s funny, I think I naturally write in a longer more convoluted style, but have worked hard to make my writing accessible and short—nice to know the effort pays off.
The cartoons are drawn with an Apple Pencil on an iPad Pro using Procreate (the studio pen is great for cartooning if you’re really interested). I set up a big canvas 1000px wide and about 5000px high, then go about drawing all of them top to bottom. Then I export to photoshop, crop and export to png with a transparent background so that whatever colour the page is shows through. Those I’ve used here on LW are screenshots from the blog itself as the image backgrounds don’t work well on white or black (the only options here—my site is generally a pastel blue). I’ve explained in another post why I’m keeping my crappy drawings in the face of the generative AI revolution.
Thanks for the extra info around terms like “coordination”, good to know. I actually mention Moloch in part 2 and have written a series on Moloch, funny you use the word “thorny”, as the cartoon characters I use for that series are called “Thorny Devils” (Moloch Horridus).
The Moloch series is great, once agian nice work on the introductory materials. I’ll send people there before the lengthy Scott Alexander post.
I just published a post related to your societal alignment problem. It’s on instruction-following AGI, and how likely it is that even AGI will remain under human control. That really places an emphasis on the societal alignment problem. It’s also about why alignment thinkers haven’t thought about this as much as they should.
I have difficulty judging how likely that is, but the odds will improve if semi-wise humans keep getting input from their increasingly wise AGIs.
I think we’re on the same page here, positing that AGI could actually help to improve alignment—if we give it that task. I really like one of your fundamental instructions being to ask about potential issues with alignment.
And on the topic of dishing out tasks, I agree that pushing the industry toward Instruction Following is an ideal path, and I think there will be a great deal of consumer demand for this sort of product. A friend of mine has mentioned this as the no-brainer approach to AI safety and even a reason what AI safety isn’t actually that big a deal… I realise you’re not making this claim in the same way.
My concern regarding this is that the industry is ultimately going to follow demand and as AI becomes more multi-faceted and capable, the market for digital companions, assistants and creative partners will incentivise the production of more human, more self-motivated agents (sovereign AGI) that generate ideas, art and conversation autonomously, even spontaneously.
Some will want a two-way partnership, rather than master-slave. This market will incentivise more self-training, self-play, even an analogue to dreaming / day-dreaming (all without a HITL). Whatever company enables this process for AI will gain market share in these areas. So, while Instruction Following AI will be safe, it won’t necessarily satisfy consumer demand in the way that a more self-motivated and therefore less-corrigible AI would.
But I agree with you that moving forward in a piecemeal fashion with the control of an IF and DWIMAC approach gives us the best opportunity to learn and adapt. The concern about sovereign AGI probably needs to be addressed through governance (enforcing HITL, enforcing a controlled pace of development, and being vigilant about the run-away potential of self-motivated agents) but it does also bring Value Alignment back into the picture. I think you do a great job of outlining how ideal an IF development path is, which should make everyone suspicious if development starts moving in a different direction.
Do you think it will be possible to create an AGI that is fundamentally Instruction Following that could satisfy the market for the human-like interaction some of the market will demand?
I apologise if you’ve, in some way I’ve not recognised, already addressed this question, there were a lot of very interesting links in your post, not all of which I could be entirely sure I grokked adequately.
Thanks for your comments, I look forward to reading more of your work.
Excellent! I think that’s a clear and compelling description of the AI alignment problem, particularly in combination with your cartoon images. I think this is worth sharing as an easy intro to the concept.
I’m curious—how did you produce the wonderful images? I can draw a little, and I’d like to be able to illustrate like you did here, whether that involves AI or some other process.
FWIW, I agree that understanding humanity’s alignment challenges is conceptually an extension of the AI alignment problem. But I think it’s commonly termed “coordination” in LW discourse, if you want to see what people have written about that problem here. Moloch is the other term of art for thorny coordination/competition problems.
Hi Seth,
Thanks for your kind words. It’s funny, I think I naturally write in a longer more convoluted style, but have worked hard to make my writing accessible and short—nice to know the effort pays off.
The cartoons are drawn with an Apple Pencil on an iPad Pro using Procreate (the studio pen is great for cartooning if you’re really interested). I set up a big canvas 1000px wide and about 5000px high, then go about drawing all of them top to bottom. Then I export to photoshop, crop and export to png with a transparent background so that whatever colour the page is shows through. Those I’ve used here on LW are screenshots from the blog itself as the image backgrounds don’t work well on white or black (the only options here—my site is generally a pastel blue). I’ve explained in another post why I’m keeping my crappy drawings in the face of the generative AI revolution.
Thanks for the extra info around terms like “coordination”, good to know. I actually mention Moloch in part 2 and have written a series on Moloch, funny you use the word “thorny”, as the cartoon characters I use for that series are called “Thorny Devils” (Moloch Horridus).
The Moloch series is great, once agian nice work on the introductory materials. I’ll send people there before the lengthy Scott Alexander post.
I just published a post related to your societal alignment problem. It’s on instruction-following AGI, and how likely it is that even AGI will remain under human control. That really places an emphasis on the societal alignment problem. It’s also about why alignment thinkers haven’t thought about this as much as they should.
https://www.lesswrong.com/posts/7NvKrqoQgJkZJmcuD/instruction-following-agi-is-easier-and-more-likely-than
I
What an insightful post!
I think we’re on the same page here, positing that AGI could actually help to improve alignment—if we give it that task. I really like one of your fundamental instructions being to ask about potential issues with alignment.
And on the topic of dishing out tasks, I agree that pushing the industry toward Instruction Following is an ideal path, and I think there will be a great deal of consumer demand for this sort of product. A friend of mine has mentioned this as the no-brainer approach to AI safety and even a reason what AI safety isn’t actually that big a deal… I realise you’re not making this claim in the same way.
My concern regarding this is that the industry is ultimately going to follow demand and as AI becomes more multi-faceted and capable, the market for digital companions, assistants and creative partners will incentivise the production of more human, more self-motivated agents (sovereign AGI) that generate ideas, art and conversation autonomously, even spontaneously.
Some will want a two-way partnership, rather than master-slave. This market will incentivise more self-training, self-play, even an analogue to dreaming / day-dreaming (all without a HITL). Whatever company enables this process for AI will gain market share in these areas. So, while Instruction Following AI will be safe, it won’t necessarily satisfy consumer demand in the way that a more self-motivated and therefore less-corrigible AI would.
But I agree with you that moving forward in a piecemeal fashion with the control of an IF and DWIMAC approach gives us the best opportunity to learn and adapt. The concern about sovereign AGI probably needs to be addressed through governance (enforcing HITL, enforcing a controlled pace of development, and being vigilant about the run-away potential of self-motivated agents) but it does also bring Value Alignment back into the picture. I think you do a great job of outlining how ideal an IF development path is, which should make everyone suspicious if development starts moving in a different direction.
Do you think it will be possible to create an AGI that is fundamentally Instruction Following that could satisfy the market for the human-like interaction some of the market will demand?
I apologise if you’ve, in some way I’ve not recognised, already addressed this question, there were a lot of very interesting links in your post, not all of which I could be entirely sure I grokked adequately.
Thanks for your comments, I look forward to reading more of your work.