Things I learned/changed my mind about thanks to your reply:
1) Good tools allow experimentation which yields insights that can (unpredictably) lead to big advancements in AI research. o1 is an example, where basically an insight discovered by someone playing around (Chain Of Thought) made its way into a model’s weights 4 (ish?) years later by informing its training. 2) Capabilities overhang getting resolved, being seen as a type of bad event that is preventable.
This is a crux in my opinion:
It is bad for cyborg tools to be broadly available because that’ll help {people trying to build the kind of AI that’d kill everyone} more than they’ll {help people trying to save the world}.
I need to look more into the specifics of AI research and of alignment work and what kind of help a powerful UI actually provides, and hopefully write a post some day. (But my intuition is, the fact that cyborg tools help both capabilities and alignment, is bad, and whether I open source code or not shouldn’t hinge on narrowing down this ratio, it should overwhelmingly favor alignment research)
I’ve written a post about my thoughts related to this, but I haven’t gone specifically into whether UI tools help alignment or capabilities more. It kind of touches on “sharing vs keeping secret” in a general way, but not head-on such that I can just write a tldr here, and not along the threads we started here. Except maybe “broader discussion/sharing/enhanced cognition gives more coordination but risks world-ending discoveries being found before coordination saves us”—not a direct quote.
But I found it too difficult to think about, and it (feeling like I have to reply here first) was blocking me from digging into other subjects and developing my ideas, so I just went on with it.
Things I learned/changed my mind about thanks to your reply:
1) Good tools allow experimentation which yields insights that can (unpredictably) lead to big advancements in AI research.
o1 is an example, where basically an insight discovered by someone playing around (Chain Of Thought) made its way into a model’s weights 4 (ish?) years later by informing its training.
2) Capabilities overhang getting resolved, being seen as a type of bad event that is preventable.
This is a crux in my opinion:
I need to look more into the specifics of AI research and of alignment work and what kind of help a powerful UI actually provides, and hopefully write a post some day.
(But my intuition is, the fact that cyborg tools help both capabilities and alignment, is bad, and whether I open source code or not shouldn’t hinge on narrowing down this ratio, it should overwhelmingly favor alignment research)
Cheers.
@Tamsin Leake
I’ve written a post about my thoughts related to this, but I haven’t gone specifically into whether UI tools help alignment or capabilities more. It kind of touches on “sharing vs keeping secret” in a general way, but not head-on such that I can just write a tldr here, and not along the threads we started here. Except maybe “broader discussion/sharing/enhanced cognition gives more coordination but risks world-ending discoveries being found before coordination saves us”—not a direct quote.
But I found it too difficult to think about, and it (feeling like I have to reply here first) was blocking me from digging into other subjects and developing my ideas, so I just went on with it.
https://www.lesswrong.com/posts/GtZ5NM9nvnddnCGGr/ai-alignment-via-civilizational-cognitive-updates