TLDR: - Building powerful human-AI collaboration tools in the open removes capabilities overhang, which reduces discourse lag, which reduces x-risk. - Alignment work is philosophy/writing/thinking-heavy, capabilities work is coding-heavy. Cyborg tools are more for the former than the latter, and great coding tools already exist.
Only goal is to reduce x-risk
Given this, “safety concerns” like “but what if someone uses your app and discovers jailbreaks or hacks somebody” are not actually a problem. (maybe even net positive, since they “update discourse” on unknown dangers, more on this later)
Specifics of capabilities work versus alignment work
(tldr: cyborg tools help alignment more than capabilities, because it’s for reading/writing, not coding)
Capabilities work
Very empirical, you’re writing code and running experiments, running actual Pytorch code running on lots and lots of gpus. (One piece of evidence is that Sholto and Trenton in the Dwarkesh podcast describe AI research as very empirical)
Alignment work
More about reading and writing on Lesswrong and Substack, thinking about things conceptually, convincing people about dangers of AGI, etc.
Some looks more like general AI research, for example mechanistic interpretability.
(Concrete/pragmatic note: my tool helps me a lot with absorbing material, next up are https://www.narrowpath.co/ and https://www.thecompendium.ai/. In normal UIs you can just dump and ask for a summary, but obviously there are more sophisticated ways to both parse an AI’s response, and for feeding it prompts. For example, ask it for a list of title/text pairs, formatted as JSON, use code that parses it and loads a widget containing titled text blocks, which is easier to browse, which my tool can currently do.
Cyborg tools
- prompt engineering (like pulling from multiple text blocks from within the app, templates, customizable/programmable “surfaces” for writing stuff, for dynamically loading sections of prompts via arbitrary code execution) - all the benefits of normal note-taking/management systems + heavy customizability
Generally reading and writing, non-agentic stuff, many people have already written about this stuff. Basically my point is that there is little overlap with capabilities research.
You can essentially take it from: type prompt → read response to: type text in widgets, type code that takes widgets and converts to messages object for api call → token stream is handled by customizable code → code is called after response is done
Capabilities overhang means discourse lags behind reality
To the extent that the (potential) abilities of a human-AI team are locked behind good UIs (which I think they are—by a huge amount, but won’t get into that here), the worse that the existing/well-known UIs are, the more the discourse about a given level of model capability is behind the actual potential of that model.
(in short: model capability*( amplification( best possible UI ) - amplification( current UI ) ) = discourse lag )
This is bad for basically everything that involves talking to other people about the dangers of AI. Which includes: affecting policy, getting funding, telling people their tech is dangerous and they should feel bad, creating institutions and cooperation among people that don’t want to die—note that these activities are a huge portion of many people’s work in AI alignment.
(The only world in which a capability overhang is not like a dam waiting to burst is if I’m the only one who will ever design super good AI tooling and if I just keep quiet nobody will ever build such things, but I’m not that smart, nor that good of a programmer (though pretty good Tkinter GUI builder))
This is a fork in the road
If cyborgism reduces x-risk then this tech should be developed as much and as soon as possible, which you do by sharing, open-sourcing and discussing.
If it doesn’t, I don’t have to go through the massive effort of open sourcing this, which involves testing, documenting, probably a lot of rewriting, which I really don’t feel like doing anyway.
Also I can only ask people for donations (aka “funding”) if I can actually show people what my idea is and what I’ve got so far.
Claude’s feedback challenging my assumptions
I shared an earlier draft of this with Claude (Sonnet 3.5), it argued against an underlying assumption. I couldn’t find a good counterargument. I’ll directly quote it since it wrote so well:
(...)
Me:
capability overhang means people think models are less powerful than they are, which means its harder to get people to take xrisk seriously, which means you can corral less researchers
Claude:
(...)
RESPONSE: This implies a specific causal chain: Better UI → Demonstrates true capabilities → Researchers update beliefs → More alignment work
But consider: 1. Researchers might update differently than expected (e.g. “oh cool, let’s build more powerful stuff!”) 2. Demonstration of capabilities could accelerate both alignment AND capabilities work 3. The bottleneck might not be belief—many researchers who fully believe in x-risk still work on capabilities
(...)
RESPONSE: ”People don’t want humanity to go extinct” seems obviously true yet doesn’t predict behavior well. Consider: - Most people believe in climate change but don’t significantly alter behavior - People smoke despite knowing it kills - Engineers kept working on nuclear weapons post-Manhattan Project
Humans seem to: 1. Discount large future risks 2. Compartmentalize (“someone else will solve it”) 3. Follow local incentives (career, status, interesting problems) over global outcomes
(draft) Cyborg software should be open (?)
TLDR:
- Building powerful human-AI collaboration tools in the open removes capabilities overhang, which reduces discourse lag, which reduces x-risk.
- Alignment work is philosophy/writing/thinking-heavy, capabilities work is coding-heavy. Cyborg tools are more for the former than the latter, and great coding tools already exist.
Only goal is to reduce x-risk
Given this, “safety concerns” like “but what if someone uses your app and discovers jailbreaks or hacks somebody” are not actually a problem. (maybe even net positive, since they “update discourse” on unknown dangers, more on this later)
Specifics of capabilities work versus alignment work
(tldr: cyborg tools help alignment more than capabilities, because it’s for reading/writing, not coding)
Capabilities work
Very empirical, you’re writing code and running experiments, running actual Pytorch code running on lots and lots of gpus.
(One piece of evidence is that Sholto and Trenton in the Dwarkesh podcast describe AI research as very empirical)
Alignment work
More about reading and writing on Lesswrong and Substack, thinking about things conceptually, convincing people about dangers of AGI, etc.
Some looks more like general AI research, for example mechanistic interpretability.
(Concrete/pragmatic note: my tool helps me a lot with absorbing material, next up are https://www.narrowpath.co/ and https://www.thecompendium.ai/. In normal UIs you can just dump and ask for a summary, but obviously there are more sophisticated ways to both parse an AI’s response, and for feeding it prompts. For example, ask it for a list of title/text pairs, formatted as JSON, use code that parses it and loads a widget containing titled text blocks, which is easier to browse, which my tool can currently do.
Cyborg tools
- prompt engineering (like pulling from multiple text blocks from within the app, templates, customizable/programmable “surfaces” for writing stuff, for dynamically loading sections of prompts via arbitrary code execution)
- all the benefits of normal note-taking/management systems + heavy customizability
Generally reading and writing, non-agentic stuff, many people have already written about this stuff. Basically my point is that there is little overlap with capabilities research.
You can essentially take it from:
type prompt → read response
to:
type text in widgets, type code that takes widgets and converts to messages object for api call → token stream is handled by customizable code → code is called after response is done
Capabilities overhang means discourse lags behind reality
To the extent that the (potential) abilities of a human-AI team are locked behind good UIs (which I think they are—by a huge amount, but won’t get into that here), the worse that the existing/well-known UIs are, the more the discourse about a given level of model capability is behind the actual potential of that model.
(in short: model capability*( amplification( best possible UI ) - amplification( current UI ) ) = discourse lag )
This is bad for basically everything that involves talking to other people about the dangers of AI.
Which includes: affecting policy, getting funding, telling people their tech is dangerous and they should feel bad, creating institutions and cooperation among people that don’t want to die—note that these activities are a huge portion of many people’s work in AI alignment.
(The only world in which a capability overhang is not like a dam waiting to burst is if I’m the only one who will ever design super good AI tooling and if I just keep quiet nobody will ever build such things, but I’m not that smart, nor that good of a programmer (though pretty good Tkinter GUI builder))
This is a fork in the road
If cyborgism reduces x-risk then this tech should be developed as much and as soon as possible, which you do by sharing, open-sourcing and discussing.
If it doesn’t, I don’t have to go through the massive effort of open sourcing this, which involves testing, documenting, probably a lot of rewriting, which I really don’t feel like doing anyway.
Also I can only ask people for donations (aka “funding”) if I can actually show people what my idea is and what I’ve got so far.
Claude’s feedback challenging my assumptions
I shared an earlier draft of this with Claude (Sonnet 3.5), it argued against an underlying assumption. I couldn’t find a good counterargument.
I’ll directly quote it since it wrote so well:
Me:
Claude:
The end.