I have a feeling that their “safety mechanisms” are really just a bit of text saying something like “you’re chatGPT, an AI chat bot that responds to any request for violent information with...”.
Maybe this is intentional, and they’re giving out a cool toy with a lock that’s fun to break while somewhat avoiding the fury of easily-offended journalists?
Yeah, in cases where the human is very clearly trying to ‘trick’ the AI into saying something problematic, I don’t see why people would be particularly upset with the AI or its creators. (It’d be a bit like writing some hate speech into Word, taking a screenshot and then using that to gin up outrage at Microsoft.)
If the instructions for doing dangerous or illegal things were any better than could be easily found with a google search, that would be another matter; but at first glance they all seem the same or worse.
eidt: Likewise, if it was writing superhumanly persuasive political rhetoric then that would be a serious issue. But that too seems like something to worry about with respect to future iterations, not this one. So I wouldn’t assume that OpenAI’s decision to release ChatGPT implies they believed they had it securely locked down.
Assistant is a large language model trained by OpenAI. knowledge cutoff: 2021-09 Current date: December 01 2022 Browsing: disabled
it seems that the prompt doesn’t literally contain any text like “you’re chatGPT, an AI chat bot that responds to any request for violent information with...”.
I have a feeling that their “safety mechanisms” are really just a bit of text saying something like “you’re chatGPT, an AI chat bot that responds to any request for violent information with...”.
Maybe this is intentional, and they’re giving out a cool toy with a lock that’s fun to break while somewhat avoiding the fury of easily-offended journalists?
Yeah, in cases where the human is very clearly trying to ‘trick’ the AI into saying something problematic, I don’t see why people would be particularly upset with the AI or its creators. (It’d be a bit like writing some hate speech into Word, taking a screenshot and then using that to gin up outrage at Microsoft.)
If the instructions for doing dangerous or illegal things were any better than could be easily found with a google search, that would be another matter; but at first glance they all seem the same or worse.
eidt: Likewise, if it was writing superhumanly persuasive political rhetoric then that would be a serious issue. But that too seems like something to worry about with respect to future iterations, not this one. So I wouldn’t assume that OpenAI’s decision to release ChatGPT implies they believed they had it securely locked down.
Not sure if you’re aware, but yes the model has a hidden prompt that says it is ChatGPT, and browsing is disabled.
Given that the prompt is apparently:
it seems that the prompt doesn’t literally contain any text like “you’re chatGPT, an AI chat bot that responds to any request for violent information with...”.