It’s summer 2020 and gpt3 is superhuman in many ways and probably could take over the world but only when its alien goo flows through the right substrate, channeled into the right alchemical chambers, then forked and rejoined and reshaped in the right way, but somehow nobody can build this.
Robustness via pauseability
Normal agent-systems (like what you design with LangChain) are brittle in that you’re never sure if they’ll do the job. A more robust system is one where you can stop execution when it’s going wrong, and edit a previous output, or edit the code that generated it, and retry.
Neat things like the $do command of Conjecture’s Tactics, don’t solve this. They allow the system in which it is implemented, to speed up any given cognitive task, simply by making it less likely that you have to edit and retry the steps along the way.$do turns the task of prompt + object shape --> object run flawlessly (except rn since it’s bugged), as a single “step”, rather than a bunch of things you have to do manually.
The same goes for how strong of a model you use. Increasing model strength should be (given a good surrounding system), a trade of token cost for retry count and tinker time and number of intermediate steps, as opposed to enabling qualitatively different tasks.
(At least not at every model upgrade step. There are (probably) some things that you can’t do with GPT-2, that you can do with GPT-4, no matter how powerful your UI and workflows are.)
A cool way of seeing it is that you turn the time dimension, which is a single point (0-D) in end-to-end algorithms, into a 1-D line, where each point along the path contains a pocket dimension you can go into (meaning inspect, pause, edit, retry).
Practical example—feel free to skip—messily written somewhat obvious point
I’ll start with a simple task and a naive solution, then by showing the problems that come up, and what it implies about the engineering challenges involved in making a “cognitive workspace”.
I personally learned these ideas by doing exactly this, by trying to build apps that help me, because *surely*, a general intelligence that is basically superhuman (I thought this even with gpt3) can completely transform my life and the world! Right?! No.
I believe that such problems are fundamental to tasks where you’re trying to get AI to help you, that that’s why an interactive UI-based approach is the right one, as opposed to an end-to-end like approach.
Task: you have an article but none of your paragraphs have titles (shoutout Paul Graham) and you want AI to give it titles.
1) Strong model solves it one shot, returning the full article.
1 -- problem) Okay, fine, that can work. But what if it’s a really long article and you don’t want the AI wasting 95% of its tokens on regurgitating paragraphs, or you’re very particular about your titles and you don’t wanna retry more than 5 times.
2) You build a simple script that splits it into paragraphs, prompts each instance of gpt3 to give a title, recombines everything and outputs your full article, and even allows a variable where you can put paragraph indices that the gpt3-caller will ignore. It solves paragraph regurgitation, you selectively regenerate titles you don’t like yet, so iteration cost is much lower, and with 2-3 lines of code you can even have custom prompts for certain paragraphs.
2 -- problem) You actually had to write the script, for one, you’ve already created a scaffold for the alien superintelligence. (Even iteratively sampling from the token distribution and appending it to the old context is a scaffold around an alien mind—but anyway) And every time you load up a new article you need to manually write indices of paragraphs to ignore, and any other little parameters you’ve added to the script. Maybe you want a whole UI for it that contains the text, where you can toggle which ones are ignored with a mouse or keyboard event, and takes you to paragraph-specific custom prompts. Or you create a mini scripting language where you can maybe add an $ignore$ line, and a $custom prompt| <prompt>$ line which the parser stores, uses for AI calls, and deletes from the final article when it outputs it with the new titles. And then maybe you want to analyze the effects of certain prompts, or store outputs across retries—etc.
3 and beyond) This just goes on and on, as the scope of the task expands. Say you’re not titling paragraphs, but doing something of the shape, “I write something, code turns it into something, AI helps with some of the parts, code turns back into the format it was originally in”—which a lot of tasks are described by, including writing code. Now you have a way broader thing that a script may not be enough for, and that even a paragraph-titler-UI is not enough for. And why shouldn’t the task scope expand? We want AI to solve all problems and create utopia, why can’t they? They’re smarter than me and almost everyone I know (most of the time anyway) they should be like 1000x more useful. Even if they’re not *that* smart, they’re still 100x faster at reading and writing, and ~~1 million x more knowledgeable.
The task scope implied by AIs that are about as smart as humans (roughly true of gpt3, definitely true of Claude 3.5), is vastly smaller than their actual task scope. I think it’s because we have skill issues. Not because we need to build 100 billion dollar compute clusters powered by nuclear reactors. (some say it’s because AIs don’t have “real” intelligence, but I don’t wanna get into that—because it’s just dumb)
programmable UI solves UI design
The space of cognitive task types is really really big, and so is the space of possible UIs that you can build to help you do those tasks. You can’t figure out a-priori what things should look like. However, if you have something that’s easily reconfigurable (programmatically reconfigurable—because you have way more power as a code-writer than as a clicker-of-UI-elements), you can iterate on its design while using it.
This relates to gradient descent, and to decentralized computation, to the thing that gives free markets their power. It’s roughly that instead of solving something directly, you can create a system that is reconfigurable, which then reaches a solution over time as it gets used and edited bit by bit across many iterations.
A given UI tool that lets AI help you do tasks, is a solution which we are unable to find. A meta-program that lets you experiment with UIs, is the system that lets you find a solution over time.
(In cybernetic terms, the system consists of the user, the program itself, any UIs inside of it that you create, and any AI instances that you consult to help you think—and now LessWrong is involved too because this is helping me think)
This “meta-UI” being easy to program and understand is not a given at all. In my experience almost all the effort goes into making functions findable and easy to understand, making data possible to analyze (since, for anything where str(data) is larger than the terminal window (+ some manual scrolling), you can’t simply print(data) to see what it looks like), and into making behaviors (like hotkey bindings) and function parameters editable on-the-fly.
Because idk how I’m gonna use something when I start writing it, and I don’t wanna think about it that hard to begin with. But IF it turns out to be neat, I want to be able to iterate on it or put it on a different hotkey or combine it with some existing event. Or if I don’t need it yet, I need to be able to find it later on. And then the more broad the tool gets, the more things I’m adding to it, the messier it gets, so internal search and inspection tools became more and more important—which I also don’t know a-priori how to design.
The f-86 wasn’t as good at individual actions, but it could transition between them faster than the MiG-15
analogous to how end-to-end algorithms, llm agents, and things optimized for the tech demo, are “impressive single actions”, but not as good for long term tasks
Cognitive workspaces, not end-to-end algorithms
Robustness via pauseability
Normal agent-systems (like what you design with LangChain) are brittle in that you’re never sure if they’ll do the job. A more robust system is one where you can stop execution when it’s going wrong, and edit a previous output, or edit the code that generated it, and retry.
Neat things like the
$do
command of Conjecture’s Tactics, don’t solve this. They allow the system in which it is implemented, to speed up any given cognitive task, simply by making it less likely that you have to edit and retry the steps along the way.$do
turns the task ofprompt + object shape --> object
run flawlessly (except rn since it’s bugged), as a single “step”, rather than a bunch of things you have to do manually.The same goes for how strong of a model you use. Increasing model strength should be (given a good surrounding system), a trade of token cost for retry count and tinker time and number of intermediate steps, as opposed to enabling qualitatively different tasks.
(At least not at every model upgrade step. There are (probably) some things that you can’t do with GPT-2, that you can do with GPT-4, no matter how powerful your UI and workflows are.)
A cool way of seeing it is that you turn the time dimension, which is a single point (0-D) in end-to-end algorithms, into a 1-D line, where each point along the path contains a pocket dimension you can go into (meaning inspect, pause, edit, retry).
Practical example—feel free to skip—messily written somewhat obvious point
I’ll start with a simple task and a naive solution, then by showing the problems that come up, and what it implies about the engineering challenges involved in making a “cognitive workspace”.
I personally learned these ideas by doing exactly this, by trying to build apps that help me, because *surely*, a general intelligence that is basically superhuman (I thought this even with gpt3) can completely transform my life and the world! Right?! No.
I believe that such problems are fundamental to tasks where you’re trying to get AI to help you, that that’s why an interactive UI-based approach is the right one, as opposed to an end-to-end like approach.
Task: you have an article but none of your paragraphs have titles (shoutout Paul Graham) and you want AI to give it titles.
1)
Strong model solves it one shot, returning the full article.
1 -- problem)
Okay, fine, that can work.
But what if it’s a really long article and you don’t want the AI wasting 95% of its tokens on regurgitating paragraphs, or you’re very particular about your titles and you don’t wanna retry more than 5 times.
2)
You build a simple script that splits it into paragraphs, prompts each instance of gpt3 to give a title, recombines everything and outputs your full article, and even allows a variable where you can put paragraph indices that the gpt3-caller will ignore.
It solves paragraph regurgitation, you selectively regenerate titles you don’t like yet, so iteration cost is much lower, and with 2-3 lines of code you can even have custom prompts for certain paragraphs.
2 -- problem)
You actually had to write the script, for one, you’ve already created a scaffold for the alien superintelligence. (Even iteratively sampling from the token distribution and appending it to the old context is a scaffold around an alien mind—but anyway)
And every time you load up a new article you need to manually write indices of paragraphs to ignore, and any other little parameters you’ve added to the script. Maybe you want a whole UI for it that contains the text, where you can toggle which ones are ignored with a mouse or keyboard event, and takes you to paragraph-specific custom prompts. Or you create a mini scripting language where you can maybe add an $ignore$ line, and a $custom prompt| <prompt>$ line which the parser stores, uses for AI calls, and deletes from the final article when it outputs it with the new titles.
And then maybe you want to analyze the effects of certain prompts, or store outputs across retries—etc.
3 and beyond)
This just goes on and on, as the scope of the task expands.
Say you’re not titling paragraphs, but doing something of the shape, “I write something, code turns it into something, AI helps with some of the parts, code turns back into the format it was originally in”—which a lot of tasks are described by, including writing code. Now you have a way broader thing that a script may not be enough for, and that even a paragraph-titler-UI is not enough for.
And why shouldn’t the task scope expand? We want AI to solve all problems and create utopia, why can’t they? They’re smarter than me and almost everyone I know (most of the time anyway) they should be like 1000x more useful. Even if they’re not *that* smart, they’re still 100x faster at reading and writing, and ~~1 million x more knowledgeable.
The task scope implied by AIs that are about as smart as humans (roughly true of gpt3, definitely true of Claude 3.5), is vastly smaller than their actual task scope. I think it’s because we have skill issues. Not because we need to build 100 billion dollar compute clusters powered by nuclear reactors. (some say it’s because AIs don’t have “real” intelligence, but I don’t wanna get into that—because it’s just dumb)
programmable UI solves UI design
The space of cognitive task types is really really big, and so is the space of possible UIs that you can build to help you do those tasks.
You can’t figure out a-priori what things should look like. However, if you have something that’s easily reconfigurable (programmatically reconfigurable—because you have way more power as a code-writer than as a clicker-of-UI-elements), you can iterate on its design while using it.
This relates to gradient descent, and to decentralized computation, to the thing that gives free markets their power.
It’s roughly that instead of solving something directly, you can create a system that is reconfigurable, which then reaches a solution over time as it gets used and edited bit by bit across many iterations.
A given UI tool that lets AI help you do tasks, is a solution which we are unable to find. A meta-program that lets you experiment with UIs, is the system that lets you find a solution over time.
(In cybernetic terms, the system consists of the user, the program itself, any UIs inside of it that you create, and any AI instances that you consult to help you think—and now LessWrong is involved too because this is helping me think)
This “meta-UI” being easy to program and understand is not a given at all. In my experience almost all the effort goes into making functions findable and easy to understand, making data possible to analyze (since, for anything where
str(data)
is larger than the terminal window (+ some manual scrolling), you can’t simplyprint(data)
to see what it looks like), and into making behaviors (like hotkey bindings) and function parameters editable on-the-fly.Because idk how I’m gonna use something when I start writing it, and I don’t wanna think about it that hard to begin with. But IF it turns out to be neat, I want to be able to iterate on it or put it on a different hotkey or combine it with some existing event. Or if I don’t need it yet, I need to be able to find it later on. And then the more broad the tool gets, the more things I’m adding to it, the messier it gets, so internal search and inspection tools became more and more important—which I also don’t know a-priori how to design.
this shortform: https://www.lesswrong.com/posts/MCBQ5B5TnDn9edAEa/atillayasar-s-shortform?commentId=QCmaJxDHz2fygbLgj was spawned from what was initially a side-note in the above one.
Jocko Willink talking about OODA loops, paraphrase
analogous to how end-to-end algorithms, llm agents, and things optimized for the tech demo, are “impressive single actions”, but not as good for long term tasks