Very interesting of you to think of it that way. It turns out that it’s very in line with recent results from computation psychiatry. Basically in depression we can study and distinguish how much the lack of activity is due to “lack of ressource to act” vs “increased cost of action”. Both look clinically about the same but underlying biochemical pathways differ, so it’s a (IMHO) promising approach to shorten the times it takes for a doctor to find the appropriate treatment for a given patient.
If that’s something you already know I’m sorry, I’m short on time and wanted this to be out :)
Sharing my setup too:
Personnaly I’m just self hosting a bunch of stuff:
litellm proxy, to connect to any llm provider
langfuse for observability
faster whisper server, the v3 turbo ctranslate2 versions takes 900mb of vram and are about 10 times faster than I speak
open-webui, as it’s connected to litellm and ollama, i avoid provider lock in and keep all my messages on my backend instead of having some at openai, some at anthropic, etc. Additionally it supports artifacts and a bunch of other nice features. It also allows me to craft my perfecr prompts. And to jailbreak when needed.
piper for now for tts but plan on switching to a selfhosted fish audio.
for extra privacy I have a bunch of ollama models too. Mistral Nemo seems to be quite capable. Otherwise a few llama3, qwen2 etc.
for embeddings either bge-m3 or some self hosted jina.ai models.
I made a bunch of scripts to pipe my microphone / speaker / clipboard / llms together for productivity. For example I press 4 times on shift, speak, then shift again, and voila what I said was turned into an anki flashcard.
As providers, I mostly rely on openrouter.ai which allows to swap between providers without issue. These last few months I’ve been using sonnet 3.5 but change as soon as there’s a new frontier model.
For interacting with codebases I use aider.
So at the end all my cost comes from API calls and none from subscriptions.