I’m convinced that people who are interested in large language models (LLMs) are overwhelmingly focused on general-purpose “performance” at the expense of exploring useful (or fun) applications.
As I’m working on a personal project, I’ve been learning my way around HuggingFace, which is a hosting platform, set of libraries, and almost-social-network for the open-source AI community. It’s fascinating, and worth exploring even if you’re not going to be developing foundation models from scratch yourself; if you simply want to usethe latest models, build apps around them, or adapt them slightly to your own purposes, HuggingFace seems like the clear place to go.
You can look at trending models, and trending public “spaces”, aka cloud-hosted instances of models that users can test out, and get a sense of where the “energy” is. And what I see is that almost all the “energy” in LLMs is on general-purpose models, competing on general-purpose question-answering benchmarks, sometimes specialized to particular languages, or to math or coding.
“How can I get something that behaves basically like ChatGPT or Claude or Gemini, but gets fewer things wrong, and ideally requires less computing power and and gets the answer faster?” is an important question, but it’s far from the only interesting one!
If I really search I can find “interesting” specialized applications like “predicts a writer’s OCEAN personality scores based on a text sample” or “uses abliteration to produce a wholly uncensored chatbot that will indeed tell you how to make a pipe bomb” but mostly…it’s general-purpose models. Not applications for specific uses that I might actually try.
And some applications seem to be eager to go to the most creepy and inhumane use cases. No, I don’t want little kids talking to a chatbot toy, especially. No, I don’t want a necklace or pair of glasses with a chatbot I can talk to. (In public? Imagine the noise pollution!) No, I certainly don’t want a bot writing emails for me!
Even the stuff I found potentially cool (an AI diary that analyzes your writing and gives you personalized advice) ended up being, in practice, so preachy that I canceled my subscription.
In the short term, of course, the most economically valuable thing to do with LLMs is duplicating human labor, so it makes sense that the priority application is autogenerated code.
But the most creative and interesting potential applications go beyond “doing things humans can already do, but cheaper” to do things that humans can’t do at all on comparable scale.
A Personalized Information Environment
To some extent, social media, search, and recommendation engines were supposed to enable us to get the “content” we want.
And mostly, to the extent that’s turned out to be a disappointment, people complain that getting exactly what you want is counterproductive — filter bubbles, superstimuli, etc.
But I find that we actually have incredibly crude tools for getting what we want.
We can follow or unfollow, block or mute people; we can upvote and downvote pieces of content and hope “the algorithm” feeds us similar results; we can mute particular words or tags.
But what we can’t do, yet, is define a “quality” we’re looking for, or a “genre” or a “vibe”, and filter by that criterion.
The old tagging systems (on Tumblr or AO3 or Delicious, or back when hashtags were used unironically on Twitter) were the closest approximation to customizable selectivity, and they’re still pretty crude.
We can do a lot better now.
Personalized Content Filter
This is a browser extension.
You teach the LLM, by highlighting and saving examples, what you consider to be “unwanted” content that you’d prefer not to see.
The model learns a classifier to sort all text in your browser into “wanted” vs “unwanted”, and shows you only the “wanted” text, leaving everything else blank.
Unlike muting/blocking particular people (who may produce a mix of objectionable and unobjectionable content) or muting particular words or phrases, which are vulnerable to context confusions1, and unlike trusting some other moderator to decide for you, you can teach your own personal machine a gestalt of the sort of thing you’d prefer not to see, and adjust it to taste.
You would, of course, be able to make multiple filters and toggle between them, if you want to “see the world” differently at different times.
You’d be able to share your filters, and probably some would become popular and widely used, the way Lists on Twitter/X and a few simple browser extensions like Shinigami Eyes are now.
Color-Coded Text
This is also a browser extension.
In addition to hiding unwanted text, you could make a more general type of text classification by labeling text according to user-defined, model-trained classification.
For instance:
right-wing text in red, left-wing text in blue.
color-coded highlighting for (predicted) humor, satire, outrage bait, commercial/promotional content
color-coded highlighting for (predicted) emotion: sad, angry, disgusted, fearful, happy, etc.
I expect it’s more difficult, but it might also be possible for the LLM to infer characteristics pertaining to the quality/validity of discussion:
non sequiturs
invalid inferences
failures of reading comprehension
There is information about what kind of text we are reading, which certainly we can detect on our own in general, but which can sneak up on us unnoticed. A “cognitive prosthetic” can potentially be helpful for keeping perspective or making prioritization easier. “Oh hey, I’ve been reading angry stuff all day, no wonder I’m getting angry myself.” Or “let me read the stuff highlighted as high-quality first.”
Fact Extraction
This could be an app.
You give it a set of resources (blog, forum, social media feed, etc) that you don’t want to actually read, and assign it to give a digest of facts (news-style, who/what/when/where concrete details) that come up in those sources.
For instance, early online discussion of COVID-19, back in January 2020, was often on sites like 4chan where racially offensive language is common. If you wanted to learn that there was a new deadly epidemic in China, you’d have to expose yourself to a lot of content most people would rather not see.
But it should be well within the capacity of modern LLMs, to filter out jokes, rhetoric, and opinion commentary, and just pick out “newsworthy” claims of fact and present them relatively neutrally.
I don’t love LLM applications for “text summarization” in general, because I’m usually going to worry that something important about the original document was missed in the auto-summary. Lots of these summarization tools seem geared for people who don’t actually like to read — otherwise, why not just read the original? But summarization could become useful if it’s more like trawling for notable “signal” in very noisy (or aversive) text.
I don’t find that current commercial LLMs are actually very good at this! I’m not sure how much additional engineering work would be necessary to make this work.
But it might literally save lives.
People with limited literacy or cognitive disabilities can find themselves in terrible situations when they can’t understand documents. Simplifying bureaucratic or official language so more people can understand it would be a massive public service.
Dispute Resolution and Mediation
For better or for worse, people end up using LLMs as oracles.
If you’re counting on the LLM to give you definitely correct advice or answers, that’s foolish. But if you merely want it to be about as good as asking your friends or doing a 5-minute Google search, it can be fine.
What makes an LLM special is that it combines a store of information, a natural language user interface, and a random number generator.
If you’re indecisive and you literally just need to pick an option, a simple coin flip will do; but if you feel like it might be important to incorporate some personalized context about your situation, you can just dump the text into the LLM and trust that “somehow” it’ll take that into account.
The key “oracular” function is not that the LLM needs to be definitely right but that it needs to be a neutral or impersonal source, the same way a dice roll or a pattern of bone cracks is. Two parties can commit to abiding by “whatever the oracle says” even if the oracle is in no way “intelligent” — but intelligence is certainly a bonus.
AITA For Everything
This works best as an app.
It’s inspired by r/AmITheAsshole’s model: given an interpersonal conflict, who’s the “asshole” (rude, unethical, unreasonable, etc)? It’s possible for multiple parties to be “assholes”, or for nobody to be.
The mechanism:
You enter your contacts into the app.
You can add contacts to a group “issue” you want to resolve.
Each participant in an “issue” describes, in writing, the situation as they see it, and submits it to the LLM. You cannot see other participants’ entries; only your own.
Once all descriptions have been submitted, the LLM sends everybody the same “verdict” — who, if anyone, is “the asshole”, and what should be done about the situation.
Of course this is not enforceable; nobody has to take the LLM’s advice. But nobody has to take a couple’s therapist’s advice either, and people still go to them.
A neutral third party who can weigh in on everyday disputes is incredibly valuable — this is what clergy often wound up doing in more religious societies — and we lack accessible, secular, private means to do this today.
Chat Moderator
This is an LLM-powered bot included into group messaging chats (eg Discord, Slack, etc.)
The bot is trained to detect conversational dynamics:
persistent patterns of boundary-pushing, rudeness, “piling on”, etc
What do you do with this sort of information? Potentially:
give the bot power to (temporarily or permanently) ban people engaging in unwanted behavior patterns
let the bot interject when it observes an unwanted conversational dynamic
allow people to ask the bot questions about what it observes, e.g. “what do you think the coalitions or “sides” in this conversation are?”
Some implementations would be very similar to human moderation, but probably more nuanced than any existing auto-moderation system; other implementations would be unsettling but potentially illuminating social experiments, that might help people gain insight into how they show up socially.
The option to ask the bot to “weigh in”, like “Hey bot, did Alice avoid answering my question right there?” can create common knowledge about conversational “tactics” that are often left plausibly deniable. Plausible deniability isn’t necessarily a bad thing in general, but at its worst it enables gaslighting. A bot that can serve as a “third party” in even a private conversation, if all parties can trust it not to have a preexisting bias, can be a sort of recourse for “hey, it’s not just my imagination, right? something shady just happened there.”
Rethinking “Online”
All our mechanisms for managing digital communication were invented before we had advanced tools for analyzing and generating natural language. A lot of things we now think of as failures might need to be revisited as more tractable now that LLMs exist.
As I remember growing up along with “Web 2.0”, back in the late 2000s and early 2010s we were continually learning new behavioral patterns enabled by digital tools. There was a first time for ordering delivery food online, or ordering a taxi online, or filling out a personality quiz, or posting on a social media site or forum, or making a video call.
And then, for a while, that kind of stagnated. All the basic “types of things you could do” on the internet were the same ones you’d been doing five or ten years ago.
I don’t think that’s fundamentally a technological stagnation. It’s not really that we had come to the end of things that could potentially be done with CRUD apps. It might be some kind of cultural consolidation — the center of gravity moving to bigger tech companies, eyeballs moving to a handful of social media sites, etc.
But I have the sense that LLMs can restart that whole “what could I do with a computer?” diversity of possibilities. Some of those will actually rely on new AI capabilities; some will be stuff that could have done before LLMs, but it didn’t occur to anyone to try.
What if, for instance, an LLM “decided” how to match dating profiles for compatibility?
Well, you could have done that in 2010 with a dot product between multiple-choice questionnaire responses, and OkCupid did.
But shh, never mind. Because we want nice things, and we should appreciate pixie dust (even computationally expensive pixie dust) that makes nice things seem possible.
And the ability to work with language as a truly malleable medium allows quite a bit nicer things than the ten-years-ago version would, so it’s not as fake as I’m making it sound. My point is that many nice things are fundamentally not dependent on any future advances in technical capability. You can do them with what we have now, and maybe even with what we had yesterday.
LLM Applications I Want To See
Link post
I’m convinced that people who are interested in large language models (LLMs) are overwhelmingly focused on general-purpose “performance” at the expense of exploring useful (or fun) applications.
As I’m working on a personal project, I’ve been learning my way around HuggingFace, which is a hosting platform, set of libraries, and almost-social-network for the open-source AI community. It’s fascinating, and worth exploring even if you’re not going to be developing foundation models from scratch yourself; if you simply want to use the latest models, build apps around them, or adapt them slightly to your own purposes, HuggingFace seems like the clear place to go.
You can look at trending models, and trending public “spaces”, aka cloud-hosted instances of models that users can test out, and get a sense of where the “energy” is. And what I see is that almost all the “energy” in LLMs is on general-purpose models, competing on general-purpose question-answering benchmarks, sometimes specialized to particular languages, or to math or coding.
“How can I get something that behaves basically like ChatGPT or Claude or Gemini, but gets fewer things wrong, and ideally requires less computing power and and gets the answer faster?” is an important question, but it’s far from the only interesting one!
If I really search I can find “interesting” specialized applications like “predicts a writer’s OCEAN personality scores based on a text sample” or “uses abliteration to produce a wholly uncensored chatbot that will indeed tell you how to make a pipe bomb” but mostly…it’s general-purpose models. Not applications for specific uses that I might actually try.
And some applications seem to be eager to go to the most creepy and inhumane use cases. No, I don’t want little kids talking to a chatbot toy, especially. No, I don’t want a necklace or pair of glasses with a chatbot I can talk to. (In public? Imagine the noise pollution!) No, I certainly don’t want a bot writing emails for me!
Even the stuff I found potentially cool (an AI diary that analyzes your writing and gives you personalized advice) ended up being, in practice, so preachy that I canceled my subscription.
In the short term, of course, the most economically valuable thing to do with LLMs is duplicating human labor, so it makes sense that the priority application is autogenerated code.
But the most creative and interesting potential applications go beyond “doing things humans can already do, but cheaper” to do things that humans can’t do at all on comparable scale.
A Personalized Information Environment
To some extent, social media, search, and recommendation engines were supposed to enable us to get the “content” we want.
And mostly, to the extent that’s turned out to be a disappointment, people complain that getting exactly what you want is counterproductive — filter bubbles, superstimuli, etc.
But I find that we actually have incredibly crude tools for getting what we want.
We can follow or unfollow, block or mute people; we can upvote and downvote pieces of content and hope “the algorithm” feeds us similar results; we can mute particular words or tags.
But what we can’t do, yet, is define a “quality” we’re looking for, or a “genre” or a “vibe”, and filter by that criterion.
The old tagging systems (on Tumblr or AO3 or Delicious, or back when hashtags were used unironically on Twitter) were the closest approximation to customizable selectivity, and they’re still pretty crude.
We can do a lot better now.
Personalized Content Filter
This is a browser extension.
You teach the LLM, by highlighting and saving examples, what you consider to be “unwanted” content that you’d prefer not to see.
The model learns a classifier to sort all text in your browser into “wanted” vs “unwanted”, and shows you only the “wanted” text, leaving everything else blank.
Unlike muting/blocking particular people (who may produce a mix of objectionable and unobjectionable content) or muting particular words or phrases, which are vulnerable to context confusions1, and unlike trusting some other moderator to decide for you, you can teach your own personal machine a gestalt of the sort of thing you’d prefer not to see, and adjust it to taste.
You would, of course, be able to make multiple filters and toggle between them, if you want to “see the world” differently at different times.
You’d be able to share your filters, and probably some would become popular and widely used, the way Lists on Twitter/X and a few simple browser extensions like Shinigami Eyes are now.
Color-Coded Text
This is also a browser extension.
In addition to hiding unwanted text, you could make a more general type of text classification by labeling text according to user-defined, model-trained classification.
For instance:
right-wing text in red, left-wing text in blue.
color-coded highlighting for (predicted) humor, satire, outrage bait, commercial/promotional content
color-coded highlighting for (predicted) emotion: sad, angry, disgusted, fearful, happy, etc.
I expect it’s more difficult, but it might also be possible for the LLM to infer characteristics pertaining to the quality/validity of discussion:
non sequiturs
invalid inferences
failures of reading comprehension
There is information about what kind of text we are reading, which certainly we can detect on our own in general, but which can sneak up on us unnoticed. A “cognitive prosthetic” can potentially be helpful for keeping perspective or making prioritization easier. “Oh hey, I’ve been reading angry stuff all day, no wonder I’m getting angry myself.” Or “let me read the stuff highlighted as high-quality first.”
Fact Extraction
This could be an app.
You give it a set of resources (blog, forum, social media feed, etc) that you don’t want to actually read, and assign it to give a digest of facts (news-style, who/what/when/where concrete details) that come up in those sources.
For instance, early online discussion of COVID-19, back in January 2020, was often on sites like 4chan where racially offensive language is common. If you wanted to learn that there was a new deadly epidemic in China, you’d have to expose yourself to a lot of content most people would rather not see.
But it should be well within the capacity of modern LLMs, to filter out jokes, rhetoric, and opinion commentary, and just pick out “newsworthy” claims of fact and present them relatively neutrally.
I don’t love LLM applications for “text summarization” in general, because I’m usually going to worry that something important about the original document was missed in the auto-summary. Lots of these summarization tools seem geared for people who don’t actually like to read — otherwise, why not just read the original? But summarization could become useful if it’s more like trawling for notable “signal” in very noisy (or aversive) text.
Plain Language
This is a browser extension that would translate everything into plain language, or language at a lower reading level. The equivalent of Simple English Wikipedia, but autogenerated and for everything.
I don’t find that current commercial LLMs are actually very good at this! I’m not sure how much additional engineering work would be necessary to make this work.
But it might literally save lives.
People with limited literacy or cognitive disabilities can find themselves in terrible situations when they can’t understand documents. Simplifying bureaucratic or official language so more people can understand it would be a massive public service.
Dispute Resolution and Mediation
For better or for worse, people end up using LLMs as oracles.
If you’re counting on the LLM to give you definitely correct advice or answers, that’s foolish. But if you merely want it to be about as good as asking your friends or doing a 5-minute Google search, it can be fine.
What makes an LLM special is that it combines a store of information, a natural language user interface, and a random number generator.
If you’re indecisive and you literally just need to pick an option, a simple coin flip will do; but if you feel like it might be important to incorporate some personalized context about your situation, you can just dump the text into the LLM and trust that “somehow” it’ll take that into account.
The key “oracular” function is not that the LLM needs to be definitely right but that it needs to be a neutral or impersonal source, the same way a dice roll or a pattern of bone cracks is. Two parties can commit to abiding by “whatever the oracle says” even if the oracle is in no way “intelligent” — but intelligence is certainly a bonus.
AITA For Everything
This works best as an app.
It’s inspired by r/AmITheAsshole’s model: given an interpersonal conflict, who’s the “asshole” (rude, unethical, unreasonable, etc)? It’s possible for multiple parties to be “assholes”, or for nobody to be.
The mechanism:
You enter your contacts into the app.
You can add contacts to a group “issue” you want to resolve.
Each participant in an “issue” describes, in writing, the situation as they see it, and submits it to the LLM. You cannot see other participants’ entries; only your own.
Once all descriptions have been submitted, the LLM sends everybody the same “verdict” — who, if anyone, is “the asshole”, and what should be done about the situation.
Of course this is not enforceable; nobody has to take the LLM’s advice. But nobody has to take a couple’s therapist’s advice either, and people still go to them.
A neutral third party who can weigh in on everyday disputes is incredibly valuable — this is what clergy often wound up doing in more religious societies — and we lack accessible, secular, private means to do this today.
Chat Moderator
This is an LLM-powered bot included into group messaging chats (eg Discord, Slack, etc.)
The bot is trained to detect conversational dynamics:
persistent patterns of boundary-pushing, rudeness, “piling on”, etc
misunderstandings or “talking past each other”
evasiveness, subject changes, non sequiturs
coalitions and alliances
who is “playing high status” and “playing low status”
What do you do with this sort of information? Potentially:
give the bot power to (temporarily or permanently) ban people engaging in unwanted behavior patterns
let the bot interject when it observes an unwanted conversational dynamic
allow people to ask the bot questions about what it observes, e.g. “what do you think the coalitions or “sides” in this conversation are?”
Some implementations would be very similar to human moderation, but probably more nuanced than any existing auto-moderation system; other implementations would be unsettling but potentially illuminating social experiments, that might help people gain insight into how they show up socially.
The option to ask the bot to “weigh in”, like “Hey bot, did Alice avoid answering my question right there?” can create common knowledge about conversational “tactics” that are often left plausibly deniable. Plausible deniability isn’t necessarily a bad thing in general, but at its worst it enables gaslighting. A bot that can serve as a “third party” in even a private conversation, if all parties can trust it not to have a preexisting bias, can be a sort of recourse for “hey, it’s not just my imagination, right? something shady just happened there.”
Rethinking “Online”
All our mechanisms for managing digital communication were invented before we had advanced tools for analyzing and generating natural language. A lot of things we now think of as failures might need to be revisited as more tractable now that LLMs exist.
As I remember growing up along with “Web 2.0”, back in the late 2000s and early 2010s we were continually learning new behavioral patterns enabled by digital tools. There was a first time for ordering delivery food online, or ordering a taxi online, or filling out a personality quiz, or posting on a social media site or forum, or making a video call.
And then, for a while, that kind of stagnated. All the basic “types of things you could do” on the internet were the same ones you’d been doing five or ten years ago.
I don’t think that’s fundamentally a technological stagnation. It’s not really that we had come to the end of things that could potentially be done with CRUD apps. It might be some kind of cultural consolidation — the center of gravity moving to bigger tech companies, eyeballs moving to a handful of social media sites, etc.
But I have the sense that LLMs can restart that whole “what could I do with a computer?” diversity of possibilities. Some of those will actually rely on new AI capabilities; some will be stuff that could have done before LLMs, but it didn’t occur to anyone to try.
What if, for instance, an LLM “decided” how to match dating profiles for compatibility?
Well, you could have done that in 2010 with a dot product between multiple-choice questionnaire responses, and OkCupid did.
But shh, never mind. Because we want nice things, and we should appreciate pixie dust (even computationally expensive pixie dust) that makes nice things seem possible.
And the ability to work with language as a truly malleable medium allows quite a bit nicer things than the ten-years-ago version would, so it’s not as fake as I’m making it sound. My point is that many nice things are fundamentally not dependent on any future advances in technical capability. You can do them with what we have now, and maybe even with what we had yesterday.
e.g. you might want to mute “woke” in the political sense but not “I woke up this morning”