I’m convinced that people who are interested in large language models (LLMs) are overwhelmingly focused on general-purpose “performance” at the expense of exploring useful (or fun) applications.
As I’m working on a personal project, I’ve been learning my way around HuggingFace, which is a hosting platform, set of libraries, and almost-social-network for the open-source AI community. It’s fascinating, and worth exploring even if you’re not going to be developing foundation models from scratch yourself; if you simply want to use the latest models, build apps around them, or adapt them slightly to your own purposes, HuggingFace seems like the clear place to go.
You can look at trending models, and trending public “spaces”, aka cloud-hosted instances of models that users can test out, and get a sense of where the “energy” is. And what I see is that almost all the “energy” in LLMs is on general-purpose models, competing on general-purpose question-answering benchmarks, sometimes specialized to particular languages, or to math or coding.
“How can I get something that behaves basically like ChatGPT or Claude or Gemini, but gets fewer things wrong, and ideally requires less computing power and and gets the answer faster?” is an important question, but it’s far from the only interesting one!
If I really search I can find “interesting” specialized applications like “predicts a writer’s OCEAN personality scores based on a text sample” or “uses abliteration to produce a wholly uncensored chatbot that will indeed tell you how to make a pipe bomb” but mostly…it’s general-purpose models. Not applications for specific uses that I might actually try.
And some applications seem to be eager to go to the most creepy and inhumane use cases. No, I don’t want little kids talking to a chatbot toy, especially. No, I don’t want a necklace or pair of glasses with a chatbot I can talk to. (In public? Imagine the noise pollution!) No, I certainly don’t want a bot writing emails for me!
Even the stuff I found potentially cool (an AI diary that analyzes your writing and gives you personalized advice) ended up being, in practice, so preachy that I canceled my subscription.
In the short term, of course, the most economically valuable thing to do with LLMs is duplicating human labor, so it makes sense that the priority application is autogenerated code.
But the most creative and interesting potential applications go beyond “doing things humans can already do, but cheaper” to do things that humans can’t do at all on comparable scale.
A Personalized Information Environment
To some extent, social media, search, and recommendation engines were supposed to enable us to get the “content” we want.
And mostly, to the extent that’s turned out to be a disappointment, people complain that getting exactly what you want is counterproductive — filter bubbles, superstimuli, etc.
But I find that we actually have incredibly crude tools for getting what we want.
We can follow or unfollow, block or mute people; we can upvote and downvote pieces of content and hope “the algorithm” feeds us similar results; we can mute particular words or tags.
But what we can’t do, yet, is define a “quality” we’re looking for, or a “genre” or a “vibe”, and filter by that criterion.
The old tagging systems (on Tumblr or AO3 or Delicious, or back when hashtags were used unironically on Twitter) were the closest approximation to customizable selectivity, and they’re still pretty crude.
We can do a lot better now.
Personalized Content Filter
This is a browser extension.
You teach the LLM, by highlighting and saving examples, what you consider to be “unwanted” content that you’d prefer not to see.
The model learns a classifier to sort all text in your browser into “wanted” vs “unwanted”, and shows you only the “wanted” text, leaving everything else blank.
Unlike muting/blocking particular people (who may produce a mix of objectionable and unobjectionable content) or muting particular words or phrases, which are vulnerable to context confusions1, and unlike trusting some other moderator to decide for you, you can teach your own personal machine a gestalt of the sort of thing you’d prefer not to see, and adjust it to taste.
You would, of course, be able to make multiple filters and toggle between them, if you want to “see the world” differently at different times.
You’d be able to share your filters, and probably some would become popular and widely used, the way Lists on Twitter/X and a few simple browser extensions like Shinigami Eyes are now.
Color-Coded Text
This is also a browser extension.
In addition to hiding unwanted text, you could make a more general type of text classification by labeling text according to user-defined, model-trained classification.
For instance:
right-wing text in red, left-wing text in blue.
color-coded highlighting for (predicted) humor, satire, outrage bait, commercial/promotional content
color-coded highlighting for (predicted) emotion: sad, angry, disgusted, fearful, happy, etc.
I expect it’s more difficult, but it might also be possible for the LLM to infer characteristics pertaining to the quality/validity of discussion:
non sequiturs
invalid inferences
failures of reading comprehension
There is information about what kind of text we are reading, which certainly we can detect on our own in general, but which can sneak up on us unnoticed. A “cognitive prosthetic” can potentially be helpful for keeping perspective or making prioritization easier. “Oh hey, I’ve been reading angry stuff all day, no wonder I’m getting angry myself.” Or “let me read the stuff highlighted as high-quality first.”
Fact Extraction
This could be an app.
You give it a set of resources (blog, forum, social media feed, etc) that you don’t want to actually read, and assign it to give a digest of facts (news-style, who/what/when/where concrete details) that come up in those sources.
For instance, early online discussion of COVID-19, back in January 2020, was often on sites like 4chan where racially offensive language is common. If you wanted to learn that there was a new deadly epidemic in China, you’d have to expose yourself to a lot of content most people would rather not see.
But it should be well within the capacity of modern LLMs, to filter out jokes, rhetoric, and opinion commentary, and just pick out “newsworthy” claims of fact and present them relatively neutrally.
I don’t love LLM applications for “text summarization” in general, because I’m usually going to worry that something important about the original document was missed in the auto-summary. Lots of these summarization tools seem geared for people who don’t actually like to read — otherwise, why not just read the original? But summarization could become useful if it’s more like trawling for notable “signal” in very noisy (or aversive) text.
Plain Language
This is a browser extension that would translate everything into plain language, or language at a lower reading level. The equivalent of Simple English Wikipedia, but autogenerated and for everything.
I don’t find that current commercial LLMs are actually very good at this! I’m not sure how much additional engineering work would be necessary to make this work.
But it might literally save lives.
People with limited literacy or cognitive disabilities can find themselves in terrible situations when they can’t understand documents. Simplifying bureaucratic or official language so more people can understand it would be a massive public service.
Dispute Resolution and Mediation
For better or for worse, people end up using LLMs as oracles.
If you’re counting on the LLM to give you definitely correct advice or answers, that’s foolish. But if you merely want it to be about as good as asking your friends or doing a 5-minute Google search, it can be fine.
What makes an LLM special is that it combines a store of information, a natural language user interface, and a random number generator.
If you’re indecisive and you literally just need to pick an option, a simple coin flip will do; but if you feel like it might be important to incorporate some personalized context about your situation, you can just dump the text into the LLM and trust that “somehow” it’ll take that into account.
The key “oracular” function is not that the LLM needs to be definitely right but that it needs to be a neutral or impersonal source, the same way a dice roll or a pattern of bone cracks is. Two parties can commit to abiding by “whatever the oracle says” even if the oracle is in no way “intelligent” — but intelligence is certainly a bonus.
AITA For Everything
This works best as an app.
It’s inspired by r/AmITheAsshole’s model: given an interpersonal conflict, who’s the “asshole” (rude, unethical, unreasonable, etc)? It’s possible for multiple parties to be “assholes”, or for nobody to be.
The mechanism:
You enter your contacts into the app.
You can add contacts to a group “issue” you want to resolve.
Each participant in an “issue” describes, in writing, the situation as they see it, and submits it to the LLM. You cannot see other participants’ entries; only your own.
Once all descriptions have been submitted, the LLM sends everybody the same “verdict” — who, if anyone, is “the asshole”, and what should be done about the situation.
Of course this is not enforceable; nobody has to take the LLM’s advice. But nobody has to take a couple’s therapist’s advice either, and people still go to them.
A neutral third party who can weigh in on everyday disputes is incredibly valuable — this is what clergy often wound up doing in more religious societies — and we lack accessible, secular, private means to do this today.
Chat Moderator
This is an LLM-powered bot included into group messaging chats (eg Discord, Slack, etc.)
The bot is trained to detect conversational dynamics:
persistent patterns of boundary-pushing, rudeness, “piling on”, etc
misunderstandings or “talking past each other”
evasiveness, subject changes, non sequiturs
coalitions and alliances
What do you do with this sort of information? Potentially:
give the bot power to (temporarily or permanently) ban people engaging in unwanted behavior patterns
let the bot interject when it observes an unwanted conversational dynamic
allow people to ask the bot questions about what it observes, e.g. “what do you think the coalitions or “sides” in this conversation are?”
Some implementations would be very similar to human moderation, but probably more nuanced than any existing auto-moderation system; other implementations would be unsettling but potentially illuminating social experiments, that might help people gain insight into how they show up socially.
The option to ask the bot to “weigh in”, like “Hey bot, did Alice avoid answering my question right there?” can create common knowledge about conversational “tactics” that are often left plausibly deniable. Plausible deniability isn’t necessarily a bad thing in general, but at its worst it enables gaslighting. A bot that can serve as a “third party” in even a private conversation, if all parties can trust it not to have a preexisting bias, can be a sort of recourse for “hey, it’s not just my imagination, right? something shady just happened there.”
Rethinking “Online”
All our mechanisms for managing digital communication were invented before we had advanced tools for analyzing and generating natural language. A lot of things we now think of as failures might need to be revisited as more tractable now that LLMs exist.
As I remember growing up along with “Web 2.0”, back in the late 2000s and early 2010s we were continually learning new behavioral patterns enabled by digital tools. There was a first time for ordering delivery food online, or ordering a taxi online, or filling out a personality quiz, or posting on a social media site or forum, or making a video call.
And then, for a while, that kind of stagnated. All the basic “types of things you could do” on the internet were the same ones you’d been doing five or ten years ago.
I don’t think that’s fundamentally a technological stagnation. It’s not really that we had come to the end of things that could potentially be done with CRUD apps. It might be some kind of cultural consolidation — the center of gravity moving to bigger tech companies, eyeballs moving to a handful of social media sites, etc.
But I have the sense that LLMs can restart that whole “what could I do with a computer?” diversity of possibilities. Some of those will actually rely on new AI capabilities; some will be stuff that could have done before LLMs, but it didn’t occur to anyone to try.
What if, for instance, an LLM “decided” how to match dating profiles for compatibility?
Well, you could have done that in 2010 with a dot product between multiple-choice questionnaire responses, and OkCupid did.
But shh, never mind. Because we want nice things, and we should appreciate pixie dust (even computationally expensive pixie dust) that makes nice things seem possible.
And the ability to work with language as a truly malleable medium allows quite a bit nicer things than the ten-years-ago version would, so it’s not as fake as I’m making it sound. My point is that many nice things are fundamentally not dependent on any future advances in technical capability. You can do them with what we have now, and maybe even with what we had yesterday.
e.g. you might want to mute “woke” in the political sense but not “I woke up this morning”
Hi! I wrote two extensions you suggested:
- “Emotion highlighter” detecting and highlighting paragraphs with 6 basic emotions
It’s very basic API call right now, I’ll think about improving it once I see if anyone uses it at all and what improvements they want (more emotions / more precise highlighting / better classification?).
- “Simple English translator” converting all text on a webpage into plain English.
They use your OpenAI API key to analyze all text on a webpage once you click the extension, and OpenAI charges $5.00 / 1M input tokens as of the time of writing this comment.
You can do personalized RHLF on a LLM. Because there’s less data, you need to do stronger training per data point than big companies do. The training is still a technical issue, but supposing that becomes cheap enough, one problem is that this produces sycophants. We already see commercial LLMs that just agree with whatever you initially imply you believe.
Producing vector embeddings is, if anything, more natural for neural networks than continuing text, and search engines already use neural networks to produce document embeddings for search. It’s entirely feasible to do this for all your personal or company documents, and then search (a vector database of) them using approximate descriptions of what you want.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Awesome ideas! These ideas are some of the things missing for LLMs to have economic impact. Companies expected them to just automate certain jobs, but that’s an all or nothing solution that’s never worked historically (until it eventually does, but we’re not there yet).
One idea I thought of when reading Scott Aaronson’s Reading Burden (https://scottaaronson.blog/?p=8217), is that people with interesting opinions and with somewhat of a public presence, have a TON of reading to do, not just to keep up with current events, but to observe people’s reactions and see the trend in ideas in response to events. Perhaps this can be improved with LLMs:
Give the model a collection of your writings and latest opinions. Have it scour online posts and their comments from your favorite sources. Each post + comments section is one input, so we need longer context. Look for opportunities to share your viewpoint. Report whether your viewpoint has already been shared or refuted, or if there are points not considered in your writings. If nothing, save yourself the effort! If something, highlight the important bits.
Might be too many LLM calls depending on the sources, obviously a retrieval stage is in order. Or that bit can be done manually, we seem pretty good at finding handfuls of interesting sounding articles, and do this anyway during procrastination.
I don’t think this claim as written is true. I learned of COVID-19 for the first time from BBC News on New Years’ Eve 2019 and followed the course of the pandemic obsessively in January/February on BBC News and some academic website whose name I’ve forgotten (I think it was affiliated with the University of Washington?) without ever going on 4chan or other similar forums.