You’re right. We don’t seem to (largely) disagree about what is going on.
1. Intentional deception
My original reading of ‘deception’ and ‘information hiding’ (your words) was around (my distinction:) ‘is this an intelligent move?’ If a chameleon is on something green and it is green to blend in, that seems different from humans or other agents conducting deception.
If GPT is annoying because it’s hard to get it to do something because of the way it’s trained that’s one thing. But lying sounds like something different.
(As I’ve thought more about this, AI looks like a place where distinctions start to break down a little. Figuring out what’s going on isn’t necessarily going to be easy, though that seems like a different issue from ‘misunderstanding terminology’.)
2. Minor point
I mean, I agree as a statistical tendency, but are you assuming away the inner alignment problem?
Hm. I was assuming the issue was ‘an outer alignment problem’, rather than ‘an inner alignment problem’.
(With caveat that ‘a different purpose (writer) calls for a different method (than prediction), or more than just prediction’ is less ‘human value is complicated’ and more ‘writing is not prediction’*.
*GPT illustrates the shocking degree to which this isn’t true. (Though maybe plagiarism is (still) an issue.)
I wonder what an inner alignment problem with GPT would look like. Not memorizing the dataset? Not engaging in ‘deceptive behavior’?)
You’re right. We don’t seem to (largely) disagree about what is going on.
1. Intentional deception
My original reading of ‘deception’ and ‘information hiding’ (your words) was around (my distinction:) ‘is this an intelligent move?’ If a chameleon is on something green and it is green to blend in, that seems different from humans or other agents conducting deception.
If GPT is annoying because it’s hard to get it to do something because of the way it’s trained that’s one thing. But lying sounds like something different.
(As I’ve thought more about this, AI looks like a place where distinctions start to break down a little. Figuring out what’s going on isn’t necessarily going to be easy, though that seems like a different issue from ‘misunderstanding terminology’.)
2. Minor point
Hm. I was assuming the issue was ‘an outer alignment problem’, rather than ‘an inner alignment problem’.
(With caveat that ‘a different purpose (writer) calls for a different method (than prediction), or more than just prediction’ is less ‘human value is complicated’ and more ‘writing is not prediction’*.
*GPT illustrates the shocking degree to which this isn’t true. (Though maybe plagiarism is (still) an issue.)
I wonder what an inner alignment problem with GPT would look like. Not memorizing the dataset? Not engaging in ‘deceptive behavior’?)
But again, not much disagreement here.