(edit: thank you for your comment! I genuinely appreciate it.)
“”“I think (not sure!) the damage from people/orgs/states going “wow, AI is powerful, I will try to build some” is larger than the upside of people/orgs/states going “wow, AI is powerful, I should be scared of it”.””” ^^ Why wouldn’t people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building? I agree that in general, cyborg tools increase hype about the black boxes and will accelerate timelines. But it still reduces discourse lag. And part of what’s bad about accelerating timelines is that you don’t have time to talk to people and build institutions—and, reducing discourse lag would help with that.
“”“By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they’d also be more likely to think “hmm maybe I shouldn’t build unaligned powerful AI”).””” ^^ You make 3 assumptions that I disagree with: 1) Only reasonable people who think hard about AI safety will understand the power of cyborgs 2) You imply a cyborg tool is a “powerful unaligned AI”, it’s not, it’s a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human 3) That people won’t eventually find out. One obvious way is that a weak superintelligence will just build it for them. (I should’ve made this explicit, that I believe that capabilities overhang is temporary, that inevitably “the dam will burst”, that then humanity will face a level of power they’re unaware of and didn’t get a chance to coordinate against. (And again, why assume it would be in the hands of the good guys?))
^^ Why wouldn’t people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building?
You imply a cyborg tool is a “powerful unaligned AI”, it’s not, it’s a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human
I was making a more general argument that applies mainly to powerful AI but also to all other things that might help one build powerful AI (such as: insights about AI, cyborg tools, etc). These things-that-help have the downside that someone could use them to build powerful but unaligned AI, which is ultimately the thing we want to delay / reduce-the-probability-of. Whether the downside is bad enough that making them public/popular is net bad is the thing that’s uncertain, but I lean towards yes, it is net bad.
I believe that:
It is bad for cyborg tools to be broadly available because that’ll help {people trying to build the kind of AI that’d kill everyone} more than they’ll {help people trying to save the world}.
It is bad for insights about AI to spread because of the same reason.
It is bad for LLM assistants to be broadly available for the same reason.
Only reasonable people who think hard about AI safety will understand the power of cyborgs
I don’t think I’m particularly relying on that assumption?? I don’t understand what sounded like I think this.
In any case, I’m not making strict “only X are Y” or “all X are Y” statements; I’m making quantitative “X are disproportionately more Y” statements.
That people won’t eventually find out.
I believe that capabilities overhang is temporary, that inevitably “the dam will burst”
Well, yes. And at that point the world is much more doomed; the world has to be saved ahead of that. To increase the probability that we have time to save the world before people find out, we want to buy time. I agree it’s inevitable, but it can be delayed. Making tools and insights broadly available hastens the bursting of the dam, which is bad; containing them delays the bursting of the dam, which is good.
Things I learned/changed my mind about thanks to your reply:
1) Good tools allow experimentation which yields insights that can (unpredictably) lead to big advancements in AI research. o1 is an example, where basically an insight discovered by someone playing around (Chain Of Thought) made its way into a model’s weights 4 (ish?) years later by informing its training. 2) Capabilities overhang getting resolved, being seen as a type of bad event that is preventable.
This is a crux in my opinion:
It is bad for cyborg tools to be broadly available because that’ll help {people trying to build the kind of AI that’d kill everyone} more than they’ll {help people trying to save the world}.
I need to look more into the specifics of AI research and of alignment work and what kind of help a powerful UI actually provides, and hopefully write a post some day. (But my intuition is, the fact that cyborg tools help both capabilities and alignment, is bad, and whether I open source code or not shouldn’t hinge on narrowing down this ratio, it should overwhelmingly favor alignment research)
I’ve written a post about my thoughts related to this, but I haven’t gone specifically into whether UI tools help alignment or capabilities more. It kind of touches on “sharing vs keeping secret” in a general way, but not head-on such that I can just write a tldr here, and not along the threads we started here. Except maybe “broader discussion/sharing/enhanced cognition gives more coordination but risks world-ending discoveries being found before coordination saves us”—not a direct quote.
But I found it too difficult to think about, and it (feeling like I have to reply here first) was blocking me from digging into other subjects and developing my ideas, so I just went on with it.
(edit: thank you for your comment! I genuinely appreciate it.)
“”“I think (not sure!) the damage from people/orgs/states going “wow, AI is powerful, I will try to build some” is larger than the upside of people/orgs/states going “wow, AI is powerful, I should be scared of it”.”””
^^ Why wouldn’t people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building?
I agree that in general, cyborg tools increase hype about the black boxes and will accelerate timelines. But it still reduces discourse lag. And part of what’s bad about accelerating timelines is that you don’t have time to talk to people and build institutions—and, reducing discourse lag would help with that.
“”“By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they’d also be more likely to think “hmm maybe I shouldn’t build unaligned powerful AI”).”””
^^ You make 3 assumptions that I disagree with:
1) Only reasonable people who think hard about AI safety will understand the power of cyborgs
2) You imply a cyborg tool is a “powerful unaligned AI”, it’s not, it’s a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human
3) That people won’t eventually find out. One obvious way is that a weak superintelligence will just build it for them. (I should’ve made this explicit, that I believe that capabilities overhang is temporary, that inevitably “the dam will burst”, that then humanity will face a level of power they’re unaware of and didn’t get a chance to coordinate against. (And again, why assume it would be in the hands of the good guys?))
I was making a more general argument that applies mainly to powerful AI but also to all other things that might help one build powerful AI (such as: insights about AI, cyborg tools, etc). These things-that-help have the downside that someone could use them to build powerful but unaligned AI, which is ultimately the thing we want to delay / reduce-the-probability-of. Whether the downside is bad enough that making them public/popular is net bad is the thing that’s uncertain, but I lean towards yes, it is net bad.
I believe that:
It is bad for cyborg tools to be broadly available because that’ll help {people trying to build the kind of AI that’d kill everyone} more than they’ll {help people trying to save the world}.
It is bad for insights about AI to spread because of the same reason.
It is bad for LLM assistants to be broadly available for the same reason.
I don’t think I’m particularly relying on that assumption?? I don’t understand what sounded like I think this.
In any case, I’m not making strict “only X are Y” or “all X are Y” statements; I’m making quantitative “X are disproportionately more Y” statements.
Well, yes. And at that point the world is much more doomed; the world has to be saved ahead of that. To increase the probability that we have time to save the world before people find out, we want to buy time. I agree it’s inevitable, but it can be delayed. Making tools and insights broadly available hastens the bursting of the dam, which is bad; containing them delays the bursting of the dam, which is good.
Things I learned/changed my mind about thanks to your reply:
1) Good tools allow experimentation which yields insights that can (unpredictably) lead to big advancements in AI research.
o1 is an example, where basically an insight discovered by someone playing around (Chain Of Thought) made its way into a model’s weights 4 (ish?) years later by informing its training.
2) Capabilities overhang getting resolved, being seen as a type of bad event that is preventable.
This is a crux in my opinion:
I need to look more into the specifics of AI research and of alignment work and what kind of help a powerful UI actually provides, and hopefully write a post some day.
(But my intuition is, the fact that cyborg tools help both capabilities and alignment, is bad, and whether I open source code or not shouldn’t hinge on narrowing down this ratio, it should overwhelmingly favor alignment research)
Cheers.
@Tamsin Leake
I’ve written a post about my thoughts related to this, but I haven’t gone specifically into whether UI tools help alignment or capabilities more. It kind of touches on “sharing vs keeping secret” in a general way, but not head-on such that I can just write a tldr here, and not along the threads we started here. Except maybe “broader discussion/sharing/enhanced cognition gives more coordination but risks world-ending discoveries being found before coordination saves us”—not a direct quote.
But I found it too difficult to think about, and it (feeling like I have to reply here first) was blocking me from digging into other subjects and developing my ideas, so I just went on with it.
https://www.lesswrong.com/posts/GtZ5NM9nvnddnCGGr/ai-alignment-via-civilizational-cognitive-updates