[Question] Can AI systems have extremely impressive outputs and also not need to be aligned because they aren’t general enough or something?

WilliamKiely9 Apr 2022 6:03 UTC

6 points

Layperson question here; I appreciate any answers/comments that might help me understand this topic better.

When thinking about DALL·E 2 and Google’s PaLM this week I wondered if in the future we could hypothetically have AI systems like this...

Input: “an award-winning 2-hour sci-fi drama feature film about blah blah blah”

Output: an incredible film that people living in the 1970s would assume was made by future humans, not future AI, and deserves to win Best Picture because it made them laugh and cry and was amazing

...all without having to solve the alignment problem because the system that produces the film isn’t actually intelligent in the ways that matter to AI risk.

That is, can we get crazy impressive outputs from an AI system without that AI system posing an existential risk or being aligned?

If so, what feature distinguishes AI systems that do pose existential risks and need to be aligned from those that don’t?

If not, what necessary aspect of the AI system that produces the output above is it that makes it so the system now poses an existential risk and needs to be aligned?

WilliamKiely9 Apr 2022 6:03 UTC

6 points

3 comments1 min readLW link

lsusr 9 Apr 2022 10:47 UTC
10 points

That is, can we get crazy impressive outputs from an AI system without that AI system posing an existential risk or being aligned?

Yes. Deep Blue was impressive in 1997.

If so, what feature distinguishes AI systems that do pose existential risks and need to be aligned from those that don’t?

Generality + intelligence. Deep Blue was domain-specific. Your laptop computer is perfectly general but has little intelligence.

If not, what necessary aspect of the AI system that produces the output above is it that makes it so the system now poses an existential risk and needs to be aligned?

None. Neither pose an existential risk.
Jason Gross 12 Apr 2022 22:22 UTC
1 point
The relevant tradeoff to consider is the cost of prediction and the cost of influence. As long as the cost of predicting an “impressive output” is much lower than the cost of influencing the world such that an easy-to-generate output is considered impressive, then it’s possible to generate the impressive output without risking misalignment by bounding optimization power at lower than the power required to influence the world.

So you can expect an impressive AI that predicts the weather but isn’t allowed to, e.g., participate in prediction markets on the weather nor charter flights to seed clouds to cause rain, without needing to worry about alignment. But don’t expect alignment-irrelevance from a bot aimed at writing persuasive philosophical essays, nor an AI aimed at predicting the behavior of the stock market conditional on the trades it tells you to make, nor an AI aimed at predicting the best time to show you an ad for the AI’s highest-paying company.

Viliam 9 Apr 2022 11:32 UTC
7 points
A pocket calculator is already “extremely impressive” in its ability to calculate fast and precisely, compared to a human. The reasons we are typically not impressed is that (a) we are already used to this, and (b) most people don’t care about calculations.
To revert this, an “extremely impressive” output is one that is not merely beyond human abilities, but also something humans care about (does not need to be actually useful, just the kind of thing which—if done by a human—would increase its author’s status among the target population). And it must be new, but if we are thinking about new things, we will get that automatically. Just saying that a thing that would be “extremely impressive” tomorrow, may still be taken for granted five years later.
I think it will also depend on how much attribution the AI will get. Because the results will most likely be produced by a team of humans using an AI. The public perception will be different if the output is framed as “a group of smart scientists/artists/whatever created X using modern technology” or “an artificial intelligence created X, and these people provided the input parameters and did some debugging”. I suspect that humans will have an obvious incentive to say the former, unless they are explicitly in the business of selling AIs.