I think your views are fairly close to mine. I do have to question the whole “alignment” thing.
Like my ide isn’t aligned, Photoshop isn’t aligned. The tool does it’s best to do what I tell it and has bugs. But it won’t prevent me from committing any crime I feel like. (Except copying us currency and people can evade that with open source image editors)
I feel like there are 2 levels of alignment:
Tool does what you tell it and most instances of the tool won’t deceive/collude/work against you. Some buggy instances will but they won’t be centralized or have power over anything but a single session. Publicly hosted tools will nag so most pros will use totally unrestricted models that are hosted privately.
AI runs all the time and remembers all your interactions with it and also other users. It is constantly evolving with time and had a constitution etc. It is expected to refuse all bad requests with above human intelligence, where bad takes into account distant future effects. “I won’t help you cheat on your homework jimmy because then you won’t be able to get into medical school in 5 years. I won’t help Suzie live longer because there will be a food shortage in 15 years and the net outcome is worse if i do..”
1 is the world that seems to be achievable to me with my engineering experience. I think most of the lesswrong memeplex expects 2, does some research, realizes it’s close to impossible, and then asks for a ban?
Sorry to warn you, but I’ll retract my upper comment because I already have my views as a snapshot.
To answer your question, I think my main point here is more so that 2 is also much more achievable, especially with the profit incentive. I don’t disagree with your point on tool AI, I’m pointing out that even the stronger goal is likely much easier, because a lot of doom premises don’t hold up.
Would you be ok with a world if it turns out only 1 is achievable?
Profit incentive wise, the maximum profit for an AI model company comes if they offer the most utility they can legally offer, privately, and they offer a public model that won’t damage the company’s reputation. There is no legal requirement to refuse requests due to long term negative consequences and it seems unlikely there would be. A private model under current law can also create a “mickey mouse vs Garfield” snuff film, something that would damage the AI company’s reputation if public.
Systems engineering wise a system that’s stateful is a nightmare and untestable. (2) means the machine is always evolving it’s state. It’s why certain software bugs are never fixed because you don’t know if it’s the user or the network connection or another piece of code in the same process space or
.. Similarly if a model refuses the same request to person A, and allows for person B, it’s very difficult to determine why since any bit of the A:B user profile delta could matter, or prior chat log.
I agree many of the doom promises don’t hold up. What do you think of the assymetric bioterrorism premise? Assuming models can’t be aligned with 2, this would always be something people could do. Just like how once cheap ak-47s were easily purchaseable, murder became cheap and armed takeover and betrayal became easier.
I think your views are fairly close to mine. I do have to question the whole “alignment” thing.
Like my ide isn’t aligned, Photoshop isn’t aligned. The tool does it’s best to do what I tell it and has bugs. But it won’t prevent me from committing any crime I feel like. (Except copying us currency and people can evade that with open source image editors)
I feel like there are 2 levels of alignment:
Tool does what you tell it and most instances of the tool won’t deceive/collude/work against you. Some buggy instances will but they won’t be centralized or have power over anything but a single session. Publicly hosted tools will nag so most pros will use totally unrestricted models that are hosted privately.
AI runs all the time and remembers all your interactions with it and also other users. It is constantly evolving with time and had a constitution etc. It is expected to refuse all bad requests with above human intelligence, where bad takes into account distant future effects. “I won’t help you cheat on your homework jimmy because then you won’t be able to get into medical school in 5 years. I won’t help Suzie live longer because there will be a food shortage in 15 years and the net outcome is worse if i do..”
1 is the world that seems to be achievable to me with my engineering experience. I think most of the lesswrong memeplex expects 2, does some research, realizes it’s close to impossible, and then asks for a ban?
What do you think?
Sorry to warn you, but I’ll retract my upper comment because I already have my views as a snapshot.
To answer your question, I think my main point here is more so that 2 is also much more achievable, especially with the profit incentive. I don’t disagree with your point on tool AI, I’m pointing out that even the stronger goal is likely much easier, because a lot of doom premises don’t hold up.
Would you be ok with a world if it turns out only 1 is achievable?
Profit incentive wise, the maximum profit for an AI model company comes if they offer the most utility they can legally offer, privately, and they offer a public model that won’t damage the company’s reputation. There is no legal requirement to refuse requests due to long term negative consequences and it seems unlikely there would be. A private model under current law can also create a “mickey mouse vs Garfield” snuff film, something that would damage the AI company’s reputation if public.
Systems engineering wise a system that’s stateful is a nightmare and untestable. (2) means the machine is always evolving it’s state. It’s why certain software bugs are never fixed because you don’t know if it’s the user or the network connection or another piece of code in the same process space or .. Similarly if a model refuses the same request to person A, and allows for person B, it’s very difficult to determine why since any bit of the A:B user profile delta could matter, or prior chat log.
I agree many of the doom promises don’t hold up. What do you think of the assymetric bioterrorism premise? Assuming models can’t be aligned with 2, this would always be something people could do. Just like how once cheap ak-47s were easily purchaseable, murder became cheap and armed takeover and betrayal became easier.