When I imagine having a personal AI assistant with approximately current levels of capability I have a variety of ideas of what I’d like it to do for me.
Auto-background-research
I like to record myself rambling about my current ideas while walking my dog. I use an app that automatically saves a mediocre transcription of the recording. Ideally, my AI assistant would respond to a new transcription by doing background research to find academic literature related to the ideas mentioned within the transcript. That way, when I went to look at my old transcript, it would already be annotated with links to prior work done on the topic, and analysis of the overlap between my ideas and the linked literature.
Also, ideally, the transcription would be better quality. Context clues about the topic of the conversation should be taken into account when making guesses about transcribing unclear words. Where things are quite unclear, there should be some sort of indicator in the text of that.
Various voice modes, and ability to issue verbal commands to switch between them
Receptive Listening Voice-mode
Also for when I want to ramble about my ideas, I’d like having my AI assistant act as a receptive listener, just saying things like ‘that makes sense’, or ‘could you explain more about what you mean about <idea>?’. Ideally, this would feel relatively natural, and not too intrusive. Asking for clarification just where the logic didn’t quite follow, or I used a jargon term in some non-standard seeming way. The conversation in this mode would be one-sided, with the AI assistant just helping to draw out my ideas, not contributing much. Occasionally it might say, ‘Do you mean <idea> in the same sense that <famous thinker> means <similar sounding idea>?’ And I could explain similarity or differences.
Question Answering Voice-mode
Basically a straightforward version of the sort of thing Perplexity does, which is try to find academic sources to answer a technical question. I’d want to be able to ask, and get a verbal answer of summaries of the sources, but also have a record saved of the conversation and the sources (ideally, downloaded and saved to a folder). This would be mostly short questions from me, and long responses from the model.
Discussion Voice-mode
More emphasis on analysis of my ideas, extrapolating, pointing out strengths and weaknesses. Something like what I get when discussing ideas with Claude Sonnet 3.5 after having told it to act as a philosopher or scientist who is critically examining my ideas. This would be a balanced back-and-forth, with roughly equal length conversational turns between me and the model.
Coding Project Collaboration
I would want to be able to describe a coding project, and as I do so have the AI assistant attempt to implement the pieces. If I were using a computer, then an ongoing live output of the effects of the code could be displayed. If I were using voice-mode, then the feedback would be occasional updates about errors or successful runs and the corresponding outputs. I could ask for certain metrics to be reported against certain datasets or benchmarks, also general metrics like runtime (or iterations per second where appropriate), memory usage, and runtime complexity estimates.
Anthropic Wishlist
Anthropic is currently my favorite model supplier, in addition to being my most trusted and approved of lab in regards to AI safety and also user data privacy. So, when I fantasize about what I’d like future models to be like, my focus tends to be on ways I wish I could improve Anthropic’s offering.
Most of these could be implemented as features in a wrapper app. But I’d prefer them to be implemented directly by Anthropic, so that I didn’t have to trust an additional company with my data.
In order from most desired to less desired:
Convenience feature: Ability to specify a system prompt which gets inserted at the beginning of every conversation. Mine would say something like, “Act like a rational scientist who gives logical critiques. Keep praise to a minimum, no sycophancy, no apologizing. Avoid filler statements like, ‘let me know if you have further questions.’
Ability to check up-to-date documents and code for publicly available code libraries. This doesn’t need to be a web search! You could have a weekly scraper check for changes to public libraries, like python libraries. So many of the issues I run into with LLMs generating code that doesn’t work is because of outdated calls to libraries which have since been updated.
Voice Mode, with appropriate interruptions, tone-of-voice detection, and prosody matching. Basically, like what OpenAI is working on.
Academic citations. This doesn’t need to be a web search! This could just be from searching an internal archive of publicly available open access scientific literature, updated once a week or so.
Convenience feature: A button to enable ‘summarize this conversation, and start a new conversation with that summary as a linked doc’.
Ability to test out generated code to make sure it at least compiles/runs before showing it to me. This would include the ability to have a code file which we were collaborating on, which got edited as the conversation went on. Instead of giving responses intended to modify just a specific function within a code file, where I need to copy/paste it in, and then rerun the code to check if it works.
Ability to have some kind of personalization of the model which went deeper than a simple system prompt. Some way for me to give feedback. Some way for me to select some of my conversations to be ‘integrated’, such that the mode / tone / info from that conversation were incorporated more into future fresh conversations. Sometimes I feel like I’ve ‘made progress’ in a conversation, gotten into a better pattern, and it’s frustrating to have to start over from scratch every time.
Edit:
Issues 1, 2 and 4 have been partially or completely alleviated in the latest experimental voice model.
Subjectively (in <1 hour of use) there seems to be a stronger tendency to hallucinate when pressed on complex topics.
I have been attempting to use chatGPT’s (primarily 4 and 4o) voice feature to have it act as a question-answering, discussion and receptive conversation partner (separately) for the last year. The topic is usually modern physics.
I’m not going to say that it “works well” but maybe half the time it does work.
The 4 biggest issues that cause frustration:
As you allude to in your post, there doesn’t seem to be a way of interrupting the model via voice once it gets stuck into a monologue.
The model will also cut you off and sometimes it will pause mid-response before continuing.
These issues seem like they could be fixed by more intelligent scaffolding.
An expert human conversation partner who is excellent at productive conversation will be able to switch seamlessly between playing the role of a receptive listening, a collaborator or an interactive tutor.
To have chatgpt play one of these roles, I usually need to spend a few minutes at the beginning of the conversation specifying how long responses should be etc.
Even after doing this, there is a strong trend in which the model will revert to giving you “generic AI slop answers”. By this I mean, the response begins with “You’ve touched on a fascinating observation about xyz” and then list 3 to 5 separate ideas.
The model was trained on text conversations, so it will often output latex equations in a manner totally inappropriate for reading out loud. This audio output is mostly incomprehensible.
To work around this I have custom instructions outlining how to verbally and precisely write equations in English. This will work maybe 25% of the time, and works 80% of the time once I spend a few minutes of the conversation going over the rules again.
When talking naturally about complicated topics I will sometimes pause mid-sentence while thinking. Doing this will cause chatgpt to think you’ve finished talking, so you’re forced to use a series of filler words to keep your sentences going, which impedes my ability to think.
https://www.librechat.ai/ is an opensource frontend to LLM APIs. It has prompt templates, which can be triggered by typing /keyword.
https://aider.chat/ can do this & even run tests. The results are fed back to the LLM for corrections. It also uses your existing codebase and autoapplies the diff.
In librechat, I sometimes tell the AI to summarize all the corrections I’ve made and then make a new template out of it.
Personal AI Assistant Ideas
When I imagine having a personal AI assistant with approximately current levels of capability I have a variety of ideas of what I’d like it to do for me.
Auto-background-research
I like to record myself rambling about my current ideas while walking my dog. I use an app that automatically saves a mediocre transcription of the recording. Ideally, my AI assistant would respond to a new transcription by doing background research to find academic literature related to the ideas mentioned within the transcript. That way, when I went to look at my old transcript, it would already be annotated with links to prior work done on the topic, and analysis of the overlap between my ideas and the linked literature.
Also, ideally, the transcription would be better quality. Context clues about the topic of the conversation should be taken into account when making guesses about transcribing unclear words. Where things are quite unclear, there should be some sort of indicator in the text of that.
Various voice modes, and ability to issue verbal commands to switch between them
Receptive Listening Voice-mode
Also for when I want to ramble about my ideas, I’d like having my AI assistant act as a receptive listener, just saying things like ‘that makes sense’, or ‘could you explain more about what you mean about <idea>?’. Ideally, this would feel relatively natural, and not too intrusive. Asking for clarification just where the logic didn’t quite follow, or I used a jargon term in some non-standard seeming way. The conversation in this mode would be one-sided, with the AI assistant just helping to draw out my ideas, not contributing much. Occasionally it might say, ‘Do you mean <idea> in the same sense that <famous thinker> means <similar sounding idea>?’ And I could explain similarity or differences.
Question Answering Voice-mode
Basically a straightforward version of the sort of thing Perplexity does, which is try to find academic sources to answer a technical question. I’d want to be able to ask, and get a verbal answer of summaries of the sources, but also have a record saved of the conversation and the sources (ideally, downloaded and saved to a folder). This would be mostly short questions from me, and long responses from the model.
Discussion Voice-mode
More emphasis on analysis of my ideas, extrapolating, pointing out strengths and weaknesses. Something like what I get when discussing ideas with Claude Sonnet 3.5 after having told it to act as a philosopher or scientist who is critically examining my ideas. This would be a balanced back-and-forth, with roughly equal length conversational turns between me and the model.
Coding Project Collaboration
I would want to be able to describe a coding project, and as I do so have the AI assistant attempt to implement the pieces. If I were using a computer, then an ongoing live output of the effects of the code could be displayed. If I were using voice-mode, then the feedback would be occasional updates about errors or successful runs and the corresponding outputs. I could ask for certain metrics to be reported against certain datasets or benchmarks, also general metrics like runtime (or iterations per second where appropriate), memory usage, and runtime complexity estimates.
Anthropic Wishlist
Anthropic is currently my favorite model supplier, in addition to being my most trusted and approved of lab in regards to AI safety and also user data privacy. So, when I fantasize about what I’d like future models to be like, my focus tends to be on ways I wish I could improve Anthropic’s offering.
Most of these could be implemented as features in a wrapper app. But I’d prefer them to be implemented directly by Anthropic, so that I didn’t have to trust an additional company with my data.
In order from most desired to less desired:
Convenience feature: Ability to specify a system prompt which gets inserted at the beginning of every conversation. Mine would say something like, “Act like a rational scientist who gives logical critiques. Keep praise to a minimum, no sycophancy, no apologizing. Avoid filler statements like, ‘let me know if you have further questions.’
Ability to check up-to-date documents and code for publicly available code libraries. This doesn’t need to be a web search! You could have a weekly scraper check for changes to public libraries, like python libraries. So many of the issues I run into with LLMs generating code that doesn’t work is because of outdated calls to libraries which have since been updated.
Voice Mode, with appropriate interruptions, tone-of-voice detection, and prosody matching. Basically, like what OpenAI is working on.
Academic citations. This doesn’t need to be a web search! This could just be from searching an internal archive of publicly available open access scientific literature, updated once a week or so.
Convenience feature: A button to enable ‘summarize this conversation, and start a new conversation with that summary as a linked doc’.
Ability to test out generated code to make sure it at least compiles/runs before showing it to me. This would include the ability to have a code file which we were collaborating on, which got edited as the conversation went on. Instead of giving responses intended to modify just a specific function within a code file, where I need to copy/paste it in, and then rerun the code to check if it works.
Ability to have some kind of personalization of the model which went deeper than a simple system prompt. Some way for me to give feedback. Some way for me to select some of my conversations to be ‘integrated’, such that the mode / tone / info from that conversation were incorporated more into future fresh conversations. Sometimes I feel like I’ve ‘made progress’ in a conversation, gotten into a better pattern, and it’s frustrating to have to start over from scratch every time.
Edit: Issues 1, 2 and 4 have been partially or completely alleviated in the latest experimental voice model. Subjectively (in <1 hour of use) there seems to be a stronger tendency to hallucinate when pressed on complex topics.
I have been attempting to use chatGPT’s (primarily 4 and 4o) voice feature to have it act as a question-answering, discussion and receptive conversation partner (separately) for the last year. The topic is usually modern physics.
I’m not going to say that it “works well” but maybe half the time it does work.
The 4 biggest issues that cause frustration:
As you allude to in your post, there doesn’t seem to be a way of interrupting the model via voice once it gets stuck into a monologue. The model will also cut you off and sometimes it will pause mid-response before continuing. These issues seem like they could be fixed by more intelligent scaffolding.
An expert human conversation partner who is excellent at productive conversation will be able to switch seamlessly between playing the role of a receptive listening, a collaborator or an interactive tutor. To have chatgpt play one of these roles, I usually need to spend a few minutes at the beginning of the conversation specifying how long responses should be etc. Even after doing this, there is a strong trend in which the model will revert to giving you “generic AI slop answers”. By this I mean, the response begins with “You’ve touched on a fascinating observation about xyz” and then list 3 to 5 separate ideas.
The model was trained on text conversations, so it will often output latex equations in a manner totally inappropriate for reading out loud. This audio output is mostly incomprehensible. To work around this I have custom instructions outlining how to verbally and precisely write equations in English. This will work maybe 25% of the time, and works 80% of the time once I spend a few minutes of the conversation going over the rules again.
When talking naturally about complicated topics I will sometimes pause mid-sentence while thinking. Doing this will cause chatgpt to think you’ve finished talking, so you’re forced to use a series of filler words to keep your sentences going, which impedes my ability to think.
https://www.librechat.ai/ is an opensource frontend to LLM APIs. It has prompt templates, which can be triggered by typing /keyword.
https://aider.chat/ can do this & even run tests. The results are fed back to the LLM for corrections. It also uses your existing codebase and autoapplies the diff.
In librechat, I sometimes tell the AI to summarize all the corrections I’ve made and then make a new template out of it.