I think that suddenly starting to using written media (even journals), in an environment without much or any guidance, is like pressing too hard on the gas; you’re gaining incredible power and going from zero to one on things faster than you ever have before.
Depending on their environment and what they’re interested in starting out, some people might learn (or be shown) how to steer quickly, whereas others might accumulate/scaffold really lopsided optimization power and crash and burn (e.g. getting involved in tons of stuff at once that upon reflection was way too much for someone just starting out).
This seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to?
I’d love to hear more on this topic.
A potentially good way to avoid low level criminals scamming your family and friends with a clone of your voice is to set a password that you each must exchange.
An extra layer of security might be to make the password offensive, an info hazard, or politically sensitive. Doing this, criminals with little technical expertise will have a harder time bypassing corporate language filters.
Good luck getting the voice model to parrot a basic meth recipe!
Good luck getting the voice model to parrot a basic meth recipe!
This is not particularly useful, plenty of voice models will happily parrot absolutely anything. The important part is not letting your phrase get out; there’s work out there on designs for protocols for how to exchange sentences in a way that guarantees no leakage even if someone overhears.
Hmm. I don’t doubt that targeted voice-mimicking scams exist (or will soon). I don’t think memorable, reused passwords are likely to work well enough to foil them. Between forgetting (on the sender or receiver end), claimed ignorance (“Mom, I’m in jail and really need money, and I’m freaking out! No, I don’t remember what we said the password would be”), and general social hurdles (“that’s a weird thing to want”), I don’t think it’ll catch on.
Instead, I’d look to context-dependent auth (looking for more confidence when the ask is scammer-adjacent), challenge-response (remember our summer in Fiji?), 2FA (let me call the court to provide the bail), or just much more context (5 minutes of casual conversation with a friend or relative is likely hard to really fake, even if the voice is close).
But really, I recommend security mindset and understanding of authorization levels, even if authentication isn’t the main worry. Most friends, even close ones, shouldn’t be allowed to ask you to mail $500 in gift cards to a random address, even if they prove they are really themselves.
I now realize that my thinking may have been particularly brutal, and I may have skipped inferential steps.
To clarify, If someone didn’t know, or was reluctant to repeat a password, I would end contact or request an in person meeting.
But to further clarify, that does not make your points invalid. I think it makes them stronger. If something is weird and risky, good luck convincing people to do it.
I currently am completing psychological studies for credit in my university psych course. The entire time, all I can think is “I wonder if that detail is the one they’re using to trick me with?”
I wonder how this impacts results. I can’t imagine being in a heightened state of looking out for deception has no impact.
I’ve seen a lot about GPT4o being kinda bad, and I’ve experienced that myself. This surprises me.
Now I will say something that feels like a silly idea. Is it possible that having the audio/visual part of the network cut off results in 4o’s poor reasoning?
As in, the whole model is doing some sort of audio/visual reasoning. But we don’t have the whole model, so it can’t reason in the way it was trained to.
If that is the case, I’d expect that when those parts are publicly released, scores on benchmarks shoot up?
Do people smarter and more informed than me have predictions about this?
Without a detailed Model Card for 4o it is impossible to know ” for sure” why models drift in performance over time, but drift they do.
It is entirely possible that Open-AI started with a version of GPT-4 Turbo, parallelize processing and performed an extensive “fine tune” to improve the multi-modal capabilities.
Essentially, the model could “forget” how to complete prompuppies. Workhfrom just a week ago, because some of its “memory” was over-written with instructions to complete requests for multi-modal replies.
I’m confused by what you mean that GPT-4o is bad? In my experience it has been stronger than plain GPT-4, especially at more complex stuff. I do physics research and it’s the first model that can actually improve the computational efficiency of parts of my code that implement physical models. It has also become more useful for discussing my research, in the sense that it dives deeper into specialized topics, while the previous GPT-4 would just respond in a very handwavy way.
Man, I wish that was my experience. I feel like I’m constantly asking GPT4o a question, getting a weird or bad response. Then switching to 4 to finish the job.
Benchmarks are consistent with GPT-4o having different strengths than GPT4-Turbo, though at a similar overall level—EQ-Bench is lower, MAGI-Hard is higher, best tested model for Creative Writing according to Claude Opus, but notably worse at judging writing (though still good for its price point).
In my experience different strengths also mean different prompt strategies are necessary; a small highly instruction-focused model might benefit from few-shot repetition and emphasis that just distract a more powerful OpenAI model for example. Which might make universal custom instructions more annoying.
Note to self, write a post about the novel akrasia solutions I thought up before becoming a rationalist.
Figuring out how to want to want to do things
Personalised advertising of Things I Wanted to Want to Do
What I do when all else fails
Have you tried whiteboarding-related techniques?
I think that suddenly starting to using written media (even journals), in an environment without much or any guidance, is like pressing too hard on the gas; you’re gaining incredible power and going from zero to one on things faster than you ever have before.
Depending on their environment and what they’re interested in starting out, some people might learn (or be shown) how to steer quickly, whereas others might accumulate/scaffold really lopsided optimization power and crash and burn (e.g. getting involved in tons of stuff at once that upon reflection was way too much for someone just starting out).
This seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to? I’d love to hear more on this topic.
Maybe I could even write a sequence on this?
A potentially good way to avoid low level criminals scamming your family and friends with a clone of your voice is to set a password that you each must exchange.
An extra layer of security might be to make the password offensive, an info hazard, or politically sensitive. Doing this, criminals with little technical expertise will have a harder time bypassing corporate language filters.
Good luck getting the voice model to parrot a basic meth recipe!
This is not particularly useful, plenty of voice models will happily parrot absolutely anything. The important part is not letting your phrase get out; there’s work out there on designs for protocols for how to exchange sentences in a way that guarantees no leakage even if someone overhears.
Hmm. I don’t doubt that targeted voice-mimicking scams exist (or will soon). I don’t think memorable, reused passwords are likely to work well enough to foil them. Between forgetting (on the sender or receiver end), claimed ignorance (“Mom, I’m in jail and really need money, and I’m freaking out! No, I don’t remember what we said the password would be”), and general social hurdles (“that’s a weird thing to want”), I don’t think it’ll catch on.
Instead, I’d look to context-dependent auth (looking for more confidence when the ask is scammer-adjacent), challenge-response (remember our summer in Fiji?), 2FA (let me call the court to provide the bail), or just much more context (5 minutes of casual conversation with a friend or relative is likely hard to really fake, even if the voice is close).
But really, I recommend security mindset and understanding of authorization levels, even if authentication isn’t the main worry. Most friends, even close ones, shouldn’t be allowed to ask you to mail $500 in gift cards to a random address, even if they prove they are really themselves.
I now realize that my thinking may have been particularly brutal, and I may have skipped inferential steps.
To clarify, If someone didn’t know, or was reluctant to repeat a password, I would end contact or request an in person meeting.
But to further clarify, that does not make your points invalid. I think it makes them stronger. If something is weird and risky, good luck convincing people to do it.
I currently am completing psychological studies for credit in my university psych course. The entire time, all I can think is “I wonder if that detail is the one they’re using to trick me with?”
I wonder how this impacts results. I can’t imagine being in a heightened state of looking out for deception has no impact.
I’ve seen a lot about GPT4o being kinda bad, and I’ve experienced that myself. This surprises me.
Now I will say something that feels like a silly idea. Is it possible that having the audio/visual part of the network cut off results in 4o’s poor reasoning? As in, the whole model is doing some sort of audio/visual reasoning. But we don’t have the whole model, so it can’t reason in the way it was trained to.
If that is the case, I’d expect that when those parts are publicly released, scores on benchmarks shoot up?
Do people smarter and more informed than me have predictions about this?
Without a detailed Model Card for 4o it is impossible to know ” for sure” why models drift in performance over time, but drift they do.
It is entirely possible that Open-AI started with a version of GPT-4 Turbo, parallelize processing and performed an extensive “fine tune” to improve the multi-modal capabilities.
Essentially, the model could “forget” how to complete prompuppies. Workhfrom just a week ago, because some of its “memory” was over-written with instructions to complete requests for multi-modal replies.
I’m confused by what you mean that GPT-4o is bad? In my experience it has been stronger than plain GPT-4, especially at more complex stuff. I do physics research and it’s the first model that can actually improve the computational efficiency of parts of my code that implement physical models. It has also become more useful for discussing my research, in the sense that it dives deeper into specialized topics, while the previous GPT-4 would just respond in a very handwavy way.
Man, I wish that was my experience. I feel like I’m constantly asking GPT4o a question, getting a weird or bad response. Then switching to 4 to finish the job.
Benchmarks are consistent with GPT-4o having different strengths than GPT4-Turbo, though at a similar overall level—EQ-Bench is lower, MAGI-Hard is higher, best tested model for Creative Writing according to Claude Opus, but notably worse at judging writing (though still good for its price point).
In my experience different strengths also mean different prompt strategies are necessary; a small highly instruction-focused model might benefit from few-shot repetition and emphasis that just distract a more powerful OpenAI model for example. Which might make universal custom instructions more annoying.