Edit:
Issues 1, 2 and 4 have been partially or completely alleviated in the latest experimental voice model.
Subjectively (in <1 hour of use) there seems to be a stronger tendency to hallucinate when pressed on complex topics.
I have been attempting to use chatGPT’s (primarily 4 and 4o) voice feature to have it act as a question-answering, discussion and receptive conversation partner (separately) for the last year. The topic is usually modern physics.
I’m not going to say that it “works well” but maybe half the time it does work.
The 4 biggest issues that cause frustration:
As you allude to in your post, there doesn’t seem to be a way of interrupting the model via voice once it gets stuck into a monologue.
The model will also cut you off and sometimes it will pause mid-response before continuing.
These issues seem like they could be fixed by more intelligent scaffolding.
An expert human conversation partner who is excellent at productive conversation will be able to switch seamlessly between playing the role of a receptive listening, a collaborator or an interactive tutor.
To have chatgpt play one of these roles, I usually need to spend a few minutes at the beginning of the conversation specifying how long responses should be etc.
Even after doing this, there is a strong trend in which the model will revert to giving you “generic AI slop answers”. By this I mean, the response begins with “You’ve touched on a fascinating observation about xyz” and then list 3 to 5 separate ideas.
The model was trained on text conversations, so it will often output latex equations in a manner totally inappropriate for reading out loud. This audio output is mostly incomprehensible.
To work around this I have custom instructions outlining how to verbally and precisely write equations in English. This will work maybe 25% of the time, and works 80% of the time once I spend a few minutes of the conversation going over the rules again.
When talking naturally about complicated topics I will sometimes pause mid-sentence while thinking. Doing this will cause chatgpt to think you’ve finished talking, so you’re forced to use a series of filler words to keep your sentences going, which impedes my ability to think.
Edit: Issues 1, 2 and 4 have been partially or completely alleviated in the latest experimental voice model. Subjectively (in <1 hour of use) there seems to be a stronger tendency to hallucinate when pressed on complex topics.
I have been attempting to use chatGPT’s (primarily 4 and 4o) voice feature to have it act as a question-answering, discussion and receptive conversation partner (separately) for the last year. The topic is usually modern physics.
I’m not going to say that it “works well” but maybe half the time it does work.
The 4 biggest issues that cause frustration:
As you allude to in your post, there doesn’t seem to be a way of interrupting the model via voice once it gets stuck into a monologue. The model will also cut you off and sometimes it will pause mid-response before continuing. These issues seem like they could be fixed by more intelligent scaffolding.
An expert human conversation partner who is excellent at productive conversation will be able to switch seamlessly between playing the role of a receptive listening, a collaborator or an interactive tutor. To have chatgpt play one of these roles, I usually need to spend a few minutes at the beginning of the conversation specifying how long responses should be etc. Even after doing this, there is a strong trend in which the model will revert to giving you “generic AI slop answers”. By this I mean, the response begins with “You’ve touched on a fascinating observation about xyz” and then list 3 to 5 separate ideas.
The model was trained on text conversations, so it will often output latex equations in a manner totally inappropriate for reading out loud. This audio output is mostly incomprehensible. To work around this I have custom instructions outlining how to verbally and precisely write equations in English. This will work maybe 25% of the time, and works 80% of the time once I spend a few minutes of the conversation going over the rules again.
When talking naturally about complicated topics I will sometimes pause mid-sentence while thinking. Doing this will cause chatgpt to think you’ve finished talking, so you’re forced to use a series of filler words to keep your sentences going, which impedes my ability to think.