This is awesome. I found it via searching LW for variations of “voice typing”, Which I was motivated to search because I had just discovered that average conversational speed is around 3x average typing speed (~150 vs ~50 wpm, cf ChatGPT). (And reading speeds are potentially in the thousands.)
At the moment, I’m using Windows Voice Access. It’s accurate, has some nice voice commands and gives you visual feedback while speaking. The inadequacy, for me, is the lack of immediate feedback (compared to typing), and customisability. I’ll attempt to test your repo tomorrow.
Also, I’m surprised that stenotypers can type faster than regular keyboards. I had previously speculated about the benefits of making keyboards more or less modular. I thought making them more modular would face similar problems as Ithkuil: too many serial operations to build the vector you mean. English relies extremely on caching specific words for specific situations with very shallow generalisations, trading semantic reach for faster lookup-times, or something. But I guess what matters most is modularity in the right places, and mainstream keyboards aren’t optimised for that at all.
I speculate that the ideal keyboard should have keys that chunk word-pieces somewhat according to Zipf’s law: the Nth most common key should be such that the frequency by which you have to use it is ~1N. I guess there should also be a term somewhere for the distance the finger has to travel to reach a key or something, to be able to weight the frequency of keys by some measure for ergonomicity.
It’s likely too late, but you will not like my repo, because the transcription only happens once you have finished the recording. You could of cause do it better but I didn’t and probably won’t.
The main reason is that I become more stupid when using ASR. My current theory is that when you are writing something down, the slowdown effect is actually beneficial because you have more cognitive resources available to compute the next thing you write. Also, the lag in voice typing is pretty horrible. I still use it to write things where I just know what to say, and saying the correct thing is easy. But if I try to do something complicated it results in much worse text, that I am even less likely to read than things I write.
I can imagine an interesting experiment, where somebody knows stenography, and then they use a program that limits the input rate. I.e. there is a lock for Xms before you can enter the next cord. Then you could test how much text quality improves by increasing the rate limit.
My current theory is that when you are writing something down, the slowdown effect is actually beneficial because you have more cognitive resources available to compute the next thing you write.
I’ve noticed the same when voice-typing, and I considered that explanation. I don’t think it’s the main cause, however. With super-fast and accurate STT (or steno), I suspect I could learn to both think better and type faster. There’s already an almost-audible internal monologue going on while I type. Adding the processing-cost of having to translate internal-audio-code into very unnatural finger-stroke-code is surely a suboptimum. (I’m reminded of the driver who, upon noticing that he’s got a flat tire on one side, punctures the other as well. It’s a deceptive optimum; a myopic plan.)
I think there is likely no significant cognitive overhead for moving your fingers to type. I expect that is done by another part of your brain, which plays at most a secondary role in idea generation. I expect the same problem to show up in Stenography when you type as fast as you generate content.
You can perform this experiment right now. Instead of writing with a keyboard, write with pen and paper. When I write with pen and paper my writing improves in quality. And it seems this is because I am slower. The thing I actually end up writing down pops into my mind with such a delay, that I would have already written down a worse previously generated output, had I only been able to write faster. Potentially there are other variables influencing quality. E.g. your motor cortex is stimulated differently when using pen and paper, compared to using QWERTY.
[Thoughts ↦ speech-code ↦ text-code] just seems like a convoluted/indirect learning-path. Speech has been optimised directly (although very gradually over thousands of years) to encode thoughts, whereas most orthographies are optimised to encode sounds. The symbols are optimised only via piggy-backing on the thoughts↦speech code—like training a language-model indirectly via NTP on [the output of an architecturally-different language-model trained via NTP on human text].
(In the conlang-orthography I aspire to make with AI-assistance, graphemes don’t try to represent sounds at all. So sort of like a logogram but much more modular & compact.)
When I write with pen and paper my writing improves in quality. And it seems this is because I am slower.
Interesting.
Anecdote: When I think to myself without writing at all (eg shower, walking, waiting, lying in bed), I tend to make deeper progress on isolated idea-clusters. Whereas when I use my knowledge-network (RemNote), I often find more spontaneous+insightfwl connections between remote idea-clusters (eg evo bio, AI, economics). This is bc when I write a quick note into RemNote, I heavily prioritise finding the right tags & portalling it into related concepts. Often I simply spam related concepts at the top like this:
The links are to concepts I’ve already spotted metaphors / use-cases for, or I have a hunch that one might be there. It prompts me to either review or flesh out the connections next time I visit the note.
I think these styles complement each other very well.
This is awesome. I found it via searching LW for variations of “voice typing”, Which I was motivated to search because I had just discovered that average conversational speed is around 3x average typing speed (~150 vs ~50 wpm, cf ChatGPT). (And reading speeds are potentially in the thousands.)
At the moment, I’m using Windows Voice Access. It’s accurate, has some nice voice commands and gives you visual feedback while speaking. The inadequacy, for me, is the lack of immediate feedback (compared to typing), and customisability. I’ll attempt to test your repo tomorrow.
Also, I’m surprised that stenotypers can type faster than regular keyboards. I had previously speculated about the benefits of making keyboards more or less modular. I thought making them more modular would face similar problems as Ithkuil: too many serial operations to build the vector you mean. English relies extremely on caching specific words for specific situations with very shallow generalisations, trading semantic reach for faster lookup-times, or something. But I guess what matters most is modularity in the right places, and mainstream keyboards aren’t optimised for that at all.
I speculate that the ideal keyboard should have keys that chunk word-pieces somewhat according to Zipf’s law: the Nth most common key should be such that the frequency by which you have to use it is ~1N. I guess there should also be a term somewhere for the distance the finger has to travel to reach a key or something, to be able to weight the frequency of keys by some measure for ergonomicity.
It’s likely too late, but you will not like my repo, because the transcription only happens once you have finished the recording. You could of cause do it better but I didn’t and probably won’t.
The main reason is that I become more stupid when using ASR. My current theory is that when you are writing something down, the slowdown effect is actually beneficial because you have more cognitive resources available to compute the next thing you write. Also, the lag in voice typing is pretty horrible. I still use it to write things where I just know what to say, and saying the correct thing is easy. But if I try to do something complicated it results in much worse text, that I am even less likely to read than things I write.
I can imagine an interesting experiment, where somebody knows stenography, and then they use a program that limits the input rate. I.e. there is a lock for Xms before you can enter the next cord. Then you could test how much text quality improves by increasing the rate limit.
I’ve noticed the same when voice-typing, and I considered that explanation. I don’t think it’s the main cause, however. With super-fast and accurate STT (or steno), I suspect I could learn to both think better and type faster. There’s already an almost-audible internal monologue going on while I type. Adding the processing-cost of having to translate internal-audio-code into very unnatural finger-stroke-code is surely a suboptimum. (I’m reminded of the driver who, upon noticing that he’s got a flat tire on one side, punctures the other as well. It’s a deceptive optimum; a myopic plan.)
I think there is likely no significant cognitive overhead for moving your fingers to type. I expect that is done by another part of your brain, which plays at most a secondary role in idea generation. I expect the same problem to show up in Stenography when you type as fast as you generate content.
You can perform this experiment right now. Instead of writing with a keyboard, write with pen and paper. When I write with pen and paper my writing improves in quality. And it seems this is because I am slower. The thing I actually end up writing down pops into my mind with such a delay, that I would have already written down a worse previously generated output, had I only been able to write faster. Potentially there are other variables influencing quality. E.g. your motor cortex is stimulated differently when using pen and paper, compared to using QWERTY.
[Thoughts ↦ speech-code ↦ text-code] just seems like a convoluted/indirect learning-path. Speech has been optimised directly (although very gradually over thousands of years) to encode thoughts, whereas most orthographies are optimised to encode sounds. The symbols are optimised only via piggy-backing on the thoughts↦speech code—like training a language-model indirectly via NTP on [the output of an architecturally-different language-model trained via NTP on human text].
(In the conlang-orthography I aspire to make with AI-assistance, graphemes don’t try to represent sounds at all. So sort of like a logogram but much more modular & compact.)
Interesting.
Anecdote: When I think to myself without writing at all (eg shower, walking, waiting, lying in bed), I tend to make deeper progress on isolated idea-clusters. Whereas when I use my knowledge-network (RemNote), I often find more spontaneous+insightfwl connections between remote idea-clusters (eg evo bio, AI, economics). This is bc when I write a quick note into RemNote, I heavily prioritise finding the right tags & portalling it into related concepts. Often I simply spam related concepts at the top like this:
The links are to concepts I’ve already spotted metaphors / use-cases for, or I have a hunch that one might be there. It prompts me to either review or flesh out the connections next time I visit the note.
I think these styles complement each other very well.