If you’re having trouble coming up with tasks for ‘artificial intelligence too cheap to meter’, it could be because you are having trouble coming up with tasks for intelligence period. Just because something is highly useful doesn’t mean you can immediately make use of it in your current local optimum; you may need to seriously reorganize your life and workflows before any kind of intelligence could be useful.
There is a good post on the front page right now about exactly this: https://www.lesswrong.com/posts/7L8ZwMJkhLXjSa7tD/the-great-data-integration-schlep Most of the examples in it do not actually depend on the details of ‘AI’ vs employee vs contractor vs API vs… - the organization is organized to defeat the improvement. It doesn’t matter whether it’s a data scientist or an AI reading the data if there is some employee whose career depends on that data not being read and who is sabotaging it, or some department defending its fief. (I usually call this concept “automation as colonization wave”: many major technologies of undoubted enormous value, such as steam or the Internet or teleconferencing/remote-working, take a long time to have massive effects because you have everyone stuck in local optima and potentially outright sabotaging any integration of the Big New Thing, and potentially have to create entirely new organizations and painfully liquidate the old ones through decades of bleeding.) There are few valuable “AI-shaped holes” because we’ve organized everything to minimize the damage from lacking AI to fill those holes, as it were: if there were some sort of organization which had naturally large LLM-shaped holes where filling them would massively increase the organization’s output… it would’ve gone extinct long ago and been replaced by ones with human-shaped holes instead, because humans were all you could get. (This is why LLM uses are pretty ridiculous right now as a % of GDP—oh wow, it can do a slightly better job of spellchecking my emails? I can have it write some code for me? Not exactly a new regime of hyperbolic global economic growth.)
So one thing you could try, if you are struggling to spend $1000/month usefully on artificial intelligence, is to instead experiment by committing to spend $1000/month on natural intelligence. That is, look into hiring a remote worker / assistant / secretary, an intern, or something else of that ilk. They are, by definition, a flexible multimodal generally-intelligent human-level neural net capable of tool use and agency, an ‘ANI’ if you will. (And if you mentally ignore that $1000/month because it’s an experiment, you can treat it as ‘natural intelligence too cheap to meter’, just regarding it a sunk cost.) An outsourced human fills a very similar hole as an AI could, so it removes the distracting factor of AI and simply asks, ‘are there any large, valuable, genuinely-moving-the-needle outsourced-human-shaped holes in your life?’ There probably are not! Then it’s no surprise if you can’t plug the holes which don’t exist with any AI, present or future.
(If this is still too confusing, you can try treating yourself as a remote worker and roleplay as them by sending yourself emails and trying to pretend you have amnesia as you write a reply and avoid doing anything a remote work could not do, like edit files on your computer, and charging yourself an appropriate hourly rate, terminating at $1000 cumulative.)
If you find you cannot make good use of your hired natural intelligent neural net, then that fully explains your difficulty of coming up with compelling usecases for artificially intelligent neural nets too. And if you do, you now have a clean set of things you can meaningfully try to do with AI services.
An analogous example might be the difficulties some people have in ‘being rich’ or ‘becoming a manager/learning to delegate’. If you were poor or are used to doing everything yourself, it can be difficult to spend your new money well or make any good use of your secretary or junior employees; but one would not infer from that that “money is useless” or “staff is useless”. It is simply that you need to figure out how to live your new life, and your old ways were adapted to your old life.
This can be surprisingly hard sometimes: there are many anecdotes of people who are destroyed by their newfound wealth or can’t do anything but hoard it, or who run an organization into the ground because they are unable to delegate. Even the simpler forms are hard. (On the very rare occasion I stay at a luxury hotel/cruise ship or go to a fancy restaurant, where there is a lot of staff who are there to cater to your every whim, I struggle to come up with whims worth catering to, because having been raised middle-class and being used to staying in the cheapest hotels where waking up sans bed bugs is a minor victory, I mostly find anything like a ‘servant’ to be extremely alienating and stressful and don’t know how to get anything out of it. I’m sure I could do so if this became an ordinary thing, but it would still take time—I don’t just automatically know how to adjust!)
For what workflows/tasks does this ‘AI delegation paradigm’ actually work though, aside from research/experimentation with AI itself? Like Janus’s apparent experiments with running an AI discord I’m sure cost a lot, but the object level work there is AI research. If AI agents could be trusted to generate a better signal/noise ratio by delegation than by working-alongside the AI (where the bottleneck is the human)....isn’t that the singularity? They’d be self sustaining.
Thus having ‘can you delegate this to a human’ be a prerequisite test of whether one’s workflow admits of delegation at all, before trying to use AI, doesn’t make sense to me? If we could do that we’d be fooming right now.
Edit: if the point is, implicitly: “yes of course directly delegating things to AI is going to fail, but nonetheless this serves as a useful mental prompt for coming up with ways to actually use AI”, I think this re-routes to what I took as the OPs question: what actual tasks? Tasks that aren’t things we’re doing already like chat, imagegen, or code completion, where again the bottleneck is the human and so the only way to increase spending there is to increase one’s workload. Perhaps one could say: “well there are ways to leverage even just chat more/better, such that you aren’t increasing your total hours working, but your AI spend is actually increasing”, then I’d ask: what are those ways?
For what workflows/tasks does this ‘AI delegation paradigm’ actually work though, aside from research/experimentation with AI itself? Like Janus’s apparent experiments with running an AI discord I’m sure cost a lot, but the object level work there is AI research. If AI agents could be trusted to generate a better signal/noise ratio by delegation than by working-alongside the AI (where the bottleneck is the human)....isn’t that the singularity? They’d be self sustaining.
I’m not following your point here. You seem to have a much more elaborate idea of outsourcing than I do. Personally cost-effective outsourcing is actually quite difficult, for all the reasons I discuss under the ‘colonization wave’ rubric. A Go AI is inarguably superhuman; nevertheless, no matter how incredible Go AI become, I would pay exactly $0 for it (or hours of Go consultation from Lee Sedol for that matter), because I don’t care about Go or play it. If society were reorganized to make Go playing a key skill of any refined person, closer to the Heian era than right now, my dollar value would very abruptly changed. But right now? $0.
What I’m suggesting is finding things like, “email your current blog post draft to the assistant for copyediting”. Does this wind up saving time on net compared to repeatedly rereading it yourself, possibly using the standard tricks like reading it upside down or printing it out? Then this is something that potentially a LLM can help with. But if it doesn’t save you time on net (because there is too much overhead or you don’t write blog posts in the first place), then it doesn’t matter how much the LLM costs, you don’t want to pay even $0 for it.
This helps illustrate the distinction between ‘capabilities’ and ‘being useful enough to me to pay $1000/month for right now’. They are not the same thing at all, and the absence of the latter only weakly implies absence of the former.
Let me give a personal concrete example. I am a writer, but I still struggle to get a lot of value out of LLMs. I couldn’t spend $1000/month on LLM calls. I currently can manage ~$50/month, between ChatGPT subscription and embeddings and highly-constrained AI formatting use (eg. converting LaTeX math to HTML/Unicode or breaking up monolithic single-paragraph abstracts into readable paragraphs), but I would struggle to double that. Why is that? Because while the LLMs are very intelligent and knowledgeable, and are often a lot better than I am at many things going beyond just programming, “automation as colonization wave” means they cannot bring that to bear in a useful way for me.
So, the last thing I wrote was a short mini-essay on why cats spontaneously bite you during petting; I argue that, in line with my knocking-things-over essay and other parts of my big cat psychology essay, that it is a misdirected prey drive where you accidentally trigger it by resembling small prey animals. I had to write it all myself, and I asked several LLMs for feedback, and made a few tweaks, but they added relatively little—let’s say <5% of the value of the finished mini-essay. I’d value the mini-essay itself at maybe $100ish; I think it is likely true and cat readers will find the discussion mildly interesting and in the long run it adds value to my site to be there, but a proof of the Riemann conjecture it is not. So, the LLM advice was at best worth a few bucks.
Why so helpless? The writing is not anything special, and the specific points made appear to all be familiar to the LLMs from the vast Internet corpus. But to deliver a lot of value, the LLMs would either have to come up with the novel connection between prey drive & spontaneous biting on their own and tell someone like me or to be able to write it given a minimal prompt from me like ‘Maybe cats bite during petting bc prey drive? write pls’.* And they would have to do so while writing like me, with appropriate Wikipedia links, specific incidents like Siegfried & Roy rather than vague bloviating, Markdown output, and inserting into the appropriate place in Gwern.net. Obviously, ye olde ChatGPT web interface does not do any of that. I have to. So, by the time I have done all that, there’s not much left for the LLM to do.
Is it impossible in principle for a LLM to do that, or would they have to be immanetizing the eschaton already before they could be of genuine value to me? No, of course not. Actually, I think that even Claude-3.5 or o1-preview would probably be capable of writing that mini-essay… with appropriate reorganization of the entire workflow. Pasting in short prompts into a chatbot browser tab doesn’t cut the mustard, but it’s not hard to see what would. For example, “Siegfried & Roy” doesn’t come out of nowhere; it is in my clippings already, and a model trained on my clippings or at least with retrieval to them would easily pull it out as an example of ‘spontaneous cat biting’ and incorporate it. Writing stylistically like me is not hard: the base models already do a good job, and they would get even better if finetuned on my site and IRC logs and whatnot to refresh their memory. Finding the place in my essays where I discuss spontaneous cat biting, like of my grandmother, is also no challenge for a LLM with retrieval or long context windows. Inserting a Markdown-formatted footnote is downright trivial. Reading my cat-related writings and prioritizing the prey drive as an explanation of otherwise-mysterious domestic cat behaviors is maybe too much ‘insight’ to expect, but given a single sentence explicitly saying it, they definitely get the idea and can elaborate on it, by writing like me a few paragraphs elaborating the idea with the relevant references I would think of and inserting it appropriately formatted into the appropriate place in the Gwern.net corpus.
I would totally pay $100 if I could type in a single sentence like ‘spontaneous biting is prey drive!’ and 10 seconds later, up pops a diff with the current mini-essay for me to read and then edit or approve; and since I have easily 10 such insights a month, then I could easily spend $1000/month.
But you can see why none of that is happening, and why you need something like my Nenex proposal before that was feasible. The SaaS providers refuse to provide non-instruction-tuned models, which write ChatGPTese barf I refuse to incorporate into my writing. They won’t finetune on a very large corpus, so it doesn’t know all of the specific factoids I would incorporate. They won’t send their model to me, so I can’t run it locally; and I’m not sending my entire computer to them either. And they would need to train tool-use for editing a corpus of Markdown files with some of my unique extensions (like Wikipedia shortcuts).
Or look at it the other way: given all these hard constraints (and the workarounds themselves being major projects—running Llama-3-405b at home is not for the faint of heart), what would it take to make the LLM use highly valuable for this cat-biting mini-essay rather than a rounding error? Well, it would have to be superhumanly capable—it would have to be somehow so eloquent that I would prefer its writing unedited out of the box, it would have to be somehow so knowledgeable about cat psychology research that its version is superior research compared to mine and me searching instead a waste of time, it would have to be so insightful about the details of cat behavior which support or contradict this thesis that I would read it and bolt upright in my chair blinking, as I exclaim “wow, that’s… actually a really good point. I never thought of that. How extremely stupid of me not to!” and resolve to always ask the LLM first in the future, etc. And obviously, we’re not at that point yet, and if we were, then things would start to look rather different (as one of the first things people would start assigning such a LLM would be the task of reorganizing workflows to unlock its true potential)...
So, writing Gwern.net mini-essays, like the one I spent an hour or two writing, is an ‘automation as colonization wave’ example. It is something that LLMs probably have the capability of doing now, which is of economic value (at least to me), and yet, is not happening now, due to reasons unrelated to LLM raw capabilities but arranging the world around them to unlock those capabilities.
And you will find that if you want to use LLMs a lot, there will be many things they could clearly do, but you aren’t going to do right now because it requires reorganizing too much around them.
* I know what you’re wondering. Claude-3.5, GPT-4o, and GPT-4 o1-preview produce outputs here which are largely useless and would cost more time to edit into something usable than they’d save.
I largely don’t think we’re disagreeing? My point didn’t depend on a distinction between ‘raw’ capabilities vs ‘possible right now with enough arranging’ capabilities, and was mostly: “I don’t see what you could actually delegate right now, as opposed to operating in the normal paradigm of ai co-work the OP is already saying they do (chat, copilot, imagegen)”, and then your personal example is detailing why you couldn’t currently delegate a task. Sounds like agreement.
Also I didn’t really consider your example of:
> “email your current blog post draft to the assistant for copyediting”.
to be outside the paradigm of AI co-work the OP is already doing, even if it saves them time. Scaling up this kind of work to the point of $1k would seem pretty difficult and also outside what I took to be their question, since this amounts to “just work a lot more yourself, and thus the proportion of work you currently use AI for will go up till you hit $1k”. That’s a lot of API credits for such normal personal use.
…
But back to your example, I do question just how much of a leap of insight/connection would be necessary to write the standard Gwern mini article. Maybe in this exact case you know there is enough latent insight/connection in your clippings/writings, and the LLM corpus, and possibly some rudimentary wikipedia/tool use, such that your prompt providing the cherry on top connecting idea (‘spontaneous biting is prey drive!‘) could actually produce a Gwern-approved mini-essay. You’d know the level of insight-leap for such articles better than I, but do you really think there’d be many such things within reach for very long? I’d argue an agent that could do this semi indefinitely, rather than just clearing your backlog of maybe like 20 such ideas, would be much more capable than we currently see, in terms of necessary ‘raw’ capability. But maybe I’m wrong and you regularly have ideas that sufficiently fit this pattern, where the bar to pass isn’t “be even close to as capable Gwern”, but: “there’s enough lying around to make the final connection, just write it up in the style of Gwern”.
Like clearly something that could actually write any gwern article would have at least your level of capability, and would foom or something similar; it’d be self sustaining. Instead what you’re describing is a setup where most of the insight, knowledge, and connection is already there, and is an instance of what I’d argue is a narrow band of possible tasks that could be delegated without necessitating {capability powerful enough to self sustain and maybe foom}. I don’t think this band is very wide; there’s not many tasks I can think of that fit this description. But I failed to think of your class of example, or eggsyntax’s below example of call center automation, so perhaps I’m simply blanking on others, and the band is wider than I thought.
But if not, then your original suggestion of, basically: “first think of what you could delegate to another human” seems a fraught starting point because the supermajority of such tasks would require capability sufficient for self sustainable ~foomy agents, but we don’t yet observe any such; our world would look very different.
I enjoyed reading this, highlights were part on reorganization of the entire workflow, as well as the linked mini-essay on cats biting due to prey drive.
If AI agents could be trusted to generate a better signal/noise ratio by delegation than by working-alongside the AI (where the bottleneck is the human)
They can’t typically (currently) do better on their own than working alongside a human, but a) a human can delegate a lot more tasks than they can collaborate on (and can delegate more cheaply to an AI than to another human), and b) though they’re not as good on their own they’re sometimes good enough.
Consider call centers as a central case here. Companies are finding it a profitable tradeoff to replace human call-center workers with AI even if the AI makes more mistakes, as long as it doesn’t make too many mistakes.
You can post on a subreddit and get replies from real people interested in that topic, for free, in less than a day.
Is that valuable? Sometimes it is, but...not usually. How much is the median comment on reddit or facebook or youtube worth? Nothing?
In the current economy, the “average-human-level intelligence” part of employees is only valuable when you’re talking about specialists in the issue at hand, even when that issue is being a general personal assistant for an executive rather than a technical engineering problem.
If you’re having trouble coming up with tasks for ‘artificial intelligence too cheap to meter’, it could be because you are having trouble coming up with tasks for intelligence period. Just because something is highly useful doesn’t mean you can immediately make use of it in your current local optimum; you may need to seriously reorganize your life and workflows before any kind of intelligence could be useful.
There is a good post on the front page right now about exactly this: https://www.lesswrong.com/posts/7L8ZwMJkhLXjSa7tD/the-great-data-integration-schlep Most of the examples in it do not actually depend on the details of ‘AI’ vs employee vs contractor vs API vs… - the organization is organized to defeat the improvement. It doesn’t matter whether it’s a data scientist or an AI reading the data if there is some employee whose career depends on that data not being read and who is sabotaging it, or some department defending its fief. (I usually call this concept “automation as colonization wave”: many major technologies of undoubted enormous value, such as steam or the Internet or teleconferencing/remote-working, take a long time to have massive effects because you have everyone stuck in local optima and potentially outright sabotaging any integration of the Big New Thing, and potentially have to create entirely new organizations and painfully liquidate the old ones through decades of bleeding.) There are few valuable “AI-shaped holes” because we’ve organized everything to minimize the damage from lacking AI to fill those holes, as it were: if there were some sort of organization which had naturally large LLM-shaped holes where filling them would massively increase the organization’s output… it would’ve gone extinct long ago and been replaced by ones with human-shaped holes instead, because humans were all you could get. (This is why LLM uses are pretty ridiculous right now as a % of GDP—oh wow, it can do a slightly better job of spellchecking my emails? I can have it write some code for me? Not exactly a new regime of hyperbolic global economic growth.)
So one thing you could try, if you are struggling to spend $1000/month usefully on artificial intelligence, is to instead experiment by committing to spend $1000/month on natural intelligence. That is, look into hiring a remote worker / assistant / secretary, an intern, or something else of that ilk. They are, by definition, a flexible multimodal generally-intelligent human-level neural net capable of tool use and agency, an ‘ANI’ if you will. (And if you mentally ignore that $1000/month because it’s an experiment, you can treat it as ‘natural intelligence too cheap to meter’, just regarding it a sunk cost.) An outsourced human fills a very similar hole as an AI could, so it removes the distracting factor of AI and simply asks, ‘are there any large, valuable, genuinely-moving-the-needle outsourced-human-shaped holes in your life?’ There probably are not! Then it’s no surprise if you can’t plug the holes which don’t exist with any AI, present or future.
(If this is still too confusing, you can try treating yourself as a remote worker and roleplay as them by sending yourself emails and trying to pretend you have amnesia as you write a reply and avoid doing anything a remote work could not do, like edit files on your computer, and charging yourself an appropriate hourly rate, terminating at $1000 cumulative.)
If you find you cannot make good use of your hired natural intelligent neural net, then that fully explains your difficulty of coming up with compelling usecases for artificially intelligent neural nets too. And if you do, you now have a clean set of things you can meaningfully try to do with AI services.
An analogous example might be the difficulties some people have in ‘being rich’ or ‘becoming a manager/learning to delegate’. If you were poor or are used to doing everything yourself, it can be difficult to spend your new money well or make any good use of your secretary or junior employees; but one would not infer from that that “money is useless” or “staff is useless”. It is simply that you need to figure out how to live your new life, and your old ways were adapted to your old life.
This can be surprisingly hard sometimes: there are many anecdotes of people who are destroyed by their newfound wealth or can’t do anything but hoard it, or who run an organization into the ground because they are unable to delegate. Even the simpler forms are hard. (On the very rare occasion I stay at a luxury hotel/cruise ship or go to a fancy restaurant, where there is a lot of staff who are there to cater to your every whim, I struggle to come up with whims worth catering to, because having been raised middle-class and being used to staying in the cheapest hotels where waking up sans bed bugs is a minor victory, I mostly find anything like a ‘servant’ to be extremely alienating and stressful and don’t know how to get anything out of it. I’m sure I could do so if this became an ordinary thing, but it would still take time—I don’t just automatically know how to adjust!)
For what workflows/tasks does this ‘AI delegation paradigm’ actually work though, aside from research/experimentation with AI itself? Like Janus’s apparent experiments with running an AI discord I’m sure cost a lot, but the object level work there is AI research. If AI agents could be trusted to generate a better signal/noise ratio by delegation than by working-alongside the AI (where the bottleneck is the human)....isn’t that the singularity? They’d be self sustaining.
Thus having ‘can you delegate this to a human’ be a prerequisite test of whether one’s workflow admits of delegation at all, before trying to use AI, doesn’t make sense to me? If we could do that we’d be fooming right now.
Edit: if the point is, implicitly: “yes of course directly delegating things to AI is going to fail, but nonetheless this serves as a useful mental prompt for coming up with ways to actually use AI”, I think this re-routes to what I took as the OPs question: what actual tasks? Tasks that aren’t things we’re doing already like chat, imagegen, or code completion, where again the bottleneck is the human and so the only way to increase spending there is to increase one’s workload. Perhaps one could say: “well there are ways to leverage even just chat more/better, such that you aren’t increasing your total hours working, but your AI spend is actually increasing”, then I’d ask: what are those ways?
I’m not following your point here. You seem to have a much more elaborate idea of outsourcing than I do. Personally cost-effective outsourcing is actually quite difficult, for all the reasons I discuss under the ‘colonization wave’ rubric. A Go AI is inarguably superhuman; nevertheless, no matter how incredible Go AI become, I would pay exactly $0 for it (or hours of Go consultation from Lee Sedol for that matter), because I don’t care about Go or play it. If society were reorganized to make Go playing a key skill of any refined person, closer to the Heian era than right now, my dollar value would very abruptly changed. But right now? $0.
What I’m suggesting is finding things like, “email your current blog post draft to the assistant for copyediting”. Does this wind up saving time on net compared to repeatedly rereading it yourself, possibly using the standard tricks like reading it upside down or printing it out? Then this is something that potentially a LLM can help with. But if it doesn’t save you time on net (because there is too much overhead or you don’t write blog posts in the first place), then it doesn’t matter how much the LLM costs, you don’t want to pay even $0 for it.
This helps illustrate the distinction between ‘capabilities’ and ‘being useful enough to me to pay $1000/month for right now’. They are not the same thing at all, and the absence of the latter only weakly implies absence of the former.
Let me give a personal concrete example. I am a writer, but I still struggle to get a lot of value out of LLMs. I couldn’t spend $1000/month on LLM calls. I currently can manage ~$50/month, between ChatGPT subscription and embeddings and highly-constrained AI formatting use (eg. converting LaTeX math to HTML/Unicode or breaking up monolithic single-paragraph abstracts into readable paragraphs), but I would struggle to double that. Why is that? Because while the LLMs are very intelligent and knowledgeable, and are often a lot better than I am at many things going beyond just programming, “automation as colonization wave” means they cannot bring that to bear in a useful way for me.
So, the last thing I wrote was a short mini-essay on why cats spontaneously bite you during petting; I argue that, in line with my knocking-things-over essay and other parts of my big cat psychology essay, that it is a misdirected prey drive where you accidentally trigger it by resembling small prey animals. I had to write it all myself, and I asked several LLMs for feedback, and made a few tweaks, but they added relatively little—let’s say <5% of the value of the finished mini-essay. I’d value the mini-essay itself at maybe $100ish; I think it is likely true and cat readers will find the discussion mildly interesting and in the long run it adds value to my site to be there, but a proof of the Riemann conjecture it is not. So, the LLM advice was at best worth a few bucks.
Why so helpless? The writing is not anything special, and the specific points made appear to all be familiar to the LLMs from the vast Internet corpus. But to deliver a lot of value, the LLMs would either have to come up with the novel connection between prey drive & spontaneous biting on their own and tell someone like me or to be able to write it given a minimal prompt from me like ‘Maybe cats bite during petting bc prey drive? write pls’.* And they would have to do so while writing like me, with appropriate Wikipedia links, specific incidents like Siegfried & Roy rather than vague bloviating, Markdown output, and inserting into the appropriate place in Gwern.net. Obviously, ye olde ChatGPT web interface does not do any of that. I have to. So, by the time I have done all that, there’s not much left for the LLM to do.
Is it impossible in principle for a LLM to do that, or would they have to be immanetizing the eschaton already before they could be of genuine value to me? No, of course not. Actually, I think that even Claude-3.5 or o1-preview would probably be capable of writing that mini-essay… with appropriate reorganization of the entire workflow. Pasting in short prompts into a chatbot browser tab doesn’t cut the mustard, but it’s not hard to see what would. For example, “Siegfried & Roy” doesn’t come out of nowhere; it is in my clippings already, and a model trained on my clippings or at least with retrieval to them would easily pull it out as an example of ‘spontaneous cat biting’ and incorporate it. Writing stylistically like me is not hard: the base models already do a good job, and they would get even better if finetuned on my site and IRC logs and whatnot to refresh their memory. Finding the place in my essays where I discuss spontaneous cat biting, like of my grandmother, is also no challenge for a LLM with retrieval or long context windows. Inserting a Markdown-formatted footnote is downright trivial. Reading my cat-related writings and prioritizing the prey drive as an explanation of otherwise-mysterious domestic cat behaviors is maybe too much ‘insight’ to expect, but given a single sentence explicitly saying it, they definitely get the idea and can elaborate on it, by writing like me a few paragraphs elaborating the idea with the relevant references I would think of and inserting it appropriately formatted into the appropriate place in the Gwern.net corpus.
I would totally pay $100 if I could type in a single sentence like ‘spontaneous biting is prey drive!’ and 10 seconds later, up pops a diff with the current mini-essay for me to read and then edit or approve; and since I have easily 10 such insights a month, then I could easily spend $1000/month.
But you can see why none of that is happening, and why you need something like my Nenex proposal before that was feasible. The SaaS providers refuse to provide non-instruction-tuned models, which write ChatGPTese barf I refuse to incorporate into my writing. They won’t finetune on a very large corpus, so it doesn’t know all of the specific factoids I would incorporate. They won’t send their model to me, so I can’t run it locally; and I’m not sending my entire computer to them either. And they would need to train tool-use for editing a corpus of Markdown files with some of my unique extensions (like Wikipedia shortcuts).
Or look at it the other way: given all these hard constraints (and the workarounds themselves being major projects—running Llama-3-405b at home is not for the faint of heart), what would it take to make the LLM use highly valuable for this cat-biting mini-essay rather than a rounding error? Well, it would have to be superhumanly capable—it would have to be somehow so eloquent that I would prefer its writing unedited out of the box, it would have to be somehow so knowledgeable about cat psychology research that its version is superior research compared to mine and me searching instead a waste of time, it would have to be so insightful about the details of cat behavior which support or contradict this thesis that I would read it and bolt upright in my chair blinking, as I exclaim “wow, that’s… actually a really good point. I never thought of that. How extremely stupid of me not to!” and resolve to always ask the LLM first in the future, etc. And obviously, we’re not at that point yet, and if we were, then things would start to look rather different (as one of the first things people would start assigning such a LLM would be the task of reorganizing workflows to unlock its true potential)...
So, writing Gwern.net mini-essays, like the one I spent an hour or two writing, is an ‘automation as colonization wave’ example. It is something that LLMs probably have the capability of doing now, which is of economic value (at least to me), and yet, is not happening now, due to reasons unrelated to LLM raw capabilities but arranging the world around them to unlock those capabilities.
And you will find that if you want to use LLMs a lot, there will be many things they could clearly do, but you aren’t going to do right now because it requires reorganizing too much around them.
* I know what you’re wondering. Claude-3.5, GPT-4o, and GPT-4 o1-preview produce outputs here which are largely useless and would cost more time to edit into something usable than they’d save.
I largely don’t think we’re disagreeing? My point didn’t depend on a distinction between ‘raw’ capabilities vs ‘possible right now with enough arranging’ capabilities, and was mostly: “I don’t see what you could actually delegate right now, as opposed to operating in the normal paradigm of ai co-work the OP is already saying they do (chat, copilot, imagegen)”, and then your personal example is detailing why you couldn’t currently delegate a task. Sounds like agreement.
Also I didn’t really consider your example of:
> “email your current blog post draft to the assistant for copyediting”.
to be outside the paradigm of AI co-work the OP is already doing, even if it saves them time. Scaling up this kind of work to the point of $1k would seem pretty difficult and also outside what I took to be their question, since this amounts to “just work a lot more yourself, and thus the proportion of work you currently use AI for will go up till you hit $1k”. That’s a lot of API credits for such normal personal use.
…
But back to your example, I do question just how much of a leap of insight/connection would be necessary to write the standard Gwern mini article. Maybe in this exact case you know there is enough latent insight/connection in your clippings/writings, and the LLM corpus, and possibly some rudimentary wikipedia/tool use, such that your prompt providing the cherry on top connecting idea (‘spontaneous biting is prey drive!‘) could actually produce a Gwern-approved mini-essay. You’d know the level of insight-leap for such articles better than I, but do you really think there’d be many such things within reach for very long? I’d argue an agent that could do this semi indefinitely, rather than just clearing your backlog of maybe like 20 such ideas, would be much more capable than we currently see, in terms of necessary ‘raw’ capability. But maybe I’m wrong and you regularly have ideas that sufficiently fit this pattern, where the bar to pass isn’t “be even close to as capable Gwern”, but: “there’s enough lying around to make the final connection, just write it up in the style of Gwern”.
Like clearly something that could actually write any gwern article would have at least your level of capability, and would foom or something similar; it’d be self sustaining. Instead what you’re describing is a setup where most of the insight, knowledge, and connection is already there, and is an instance of what I’d argue is a narrow band of possible tasks that could be delegated without necessitating {capability powerful enough to self sustain and maybe foom}. I don’t think this band is very wide; there’s not many tasks I can think of that fit this description. But I failed to think of your class of example, or eggsyntax’s below example of call center automation, so perhaps I’m simply blanking on others, and the band is wider than I thought.
But if not, then your original suggestion of, basically: “first think of what you could delegate to another human” seems a fraught starting point because the supermajority of such tasks would require capability sufficient for self sustainable ~foomy agents, but we don’t yet observe any such; our world would look very different.
I enjoyed reading this, highlights were part on reorganization of the entire workflow, as well as the linked mini-essay on cats biting due to prey drive.
They can’t typically (currently) do better on their own than working alongside a human, but a) a human can delegate a lot more tasks than they can collaborate on (and can delegate more cheaply to an AI than to another human), and b) though they’re not as good on their own they’re sometimes good enough.
Consider call centers as a central case here. Companies are finding it a profitable tradeoff to replace human call-center workers with AI even if the AI makes more mistakes, as long as it doesn’t make too many mistakes.
You can post on a subreddit and get replies from real people interested in that topic, for free, in less than a day.
Is that valuable? Sometimes it is, but...not usually. How much is the median comment on reddit or facebook or youtube worth? Nothing?
In the current economy, the “average-human-level intelligence” part of employees is only valuable when you’re talking about specialists in the issue at hand, even when that issue is being a general personal assistant for an executive rather than a technical engineering problem.