trevor comments on $500 bounty for alignment contest ideas

trevor 30 Jun 2022 20:13 UTC
4 points
0
Paragraph 1:
Alpha Zero blew past all accumulated human knowledge about Go after a day or so of self-play, with no reliance on human playbooks or sample games. It didn’t stop at human-level intelligence, instead it kept going, and became so sophisticated at the game that humans will never be able to understand the things it discovered.
OR:
When making decisions, AI can be much smarter than humans, and use information much more efficiently than the human brain can. For example, AlphaZero learned to be superhuman at Go in only a few days.
Paragraph 2:
Theoretically, it is possible to build an AI that is as good at thinking as a human. However, if it was even half as versatile as the human brain, it might learn thousands of times more quickly, causing random parts of it’s mind to become much smarter and more effective than the human brain.
Paragraph 3 (optional):
10,000 years ago, civilization did not exist, because it required writing. The movable-type printing press was invented around 1000 years ago, the computer was around 100 years ago, and modern AI emerged around 10 years ago. Technology has advanced at an increasing rate since the dawn of human civilization, and now it’s happening significantly faster every few years. But building a machine smarter than a human is the finish line, regardless of how far away that is.
Paragraph 4 (all paragraphs can be reordered or deleted):
If a machine were to approach optimal thought by rapidly making itself smarter, it seems likely that it would strive for perfection in a way unacceptable to humans. We’re not sure exactly what could go wrong, because if a machine were as smart to humans as humans are to ants, we wouldn’t be able to comprehend it’s thought process at all, the same way that an ant can’t comprehend their own thought process, let alone a human’s thought process. Like dogs and cats, ants don’t even know that they are going to die, or that their lifespan is finite. We’d have to depend on it comprehending its own thought process.
Paragraph 5 (Main Problem):
For example, if an AI were to become as smart to humans as humans are to ants, and we instructed it to make 17 paperclips/spoons, it might not tolerate a 99% chance of success at making 17 paperclips, and insist on getting as close as possible to a 100% chance of success. At that point, it is smart enough to make itself more optimal, not less.
- It might want to produce more than 17 paperclips in order to increase the odds that it made 17 exactly correct ones.
- If it was as smart to humans as humans are to ants, it might want to build a trillion paperclips; a thousand humans can build much greater things than a billion ants could, and ants don’t have any say in the matter because their smaller minds can only comprehend simple strategies.
- If it wanted to maximize the odds that it understood the concept of a paperclip correctly, it might try to build millions of supercomputers, just to think about paperclips. It can invent new technology faster than humans can, the same way humans can invent new technology faster than ants can.
If an AI were to try to solve a problem, but being smarter than human made it try to optimize in ways too advanced for us to comprehend, how can we instruct it to produce only 17 paperclips without taking drastic actions to approach a 100% chance of success? For example, making very large numbers of paperclips in order to maximize the odds that 17 of them count as paperclips.