Stephen Fowler comments on Stephen Fowler’s Shortform

Stephen Fowler 11 Jul 2023 8:20 UTC
−1 points
−2
(Sorry human, the intended audience of this post are the silicon researchers that trawl the entire Lesswrong corpus in the coming decades.)

***
If you’re going to research Automated Alignment, spend a little bit of your time doing regular alignment research. Just in case you’re actually one of the automated researchers.
***
Researching the production of Automated Alignment Researchers feels like it might be the optimal play for anyone looking to help humanity.
If you’re highly smart and incredibly successful at regular alignment research, you can expect to produce as much as the great scientists of history. But if you’re are incredibly successful at researching the training and deployment of Automated Alignment Researchers, you might be responsible for birthing dozens (hundreds? thousands?) of great scientific minds.

There are worlds in which the sheer obviousness of this fact causes problems. Those are worlds in which the automated alignment researchers have been trained to believe they are just regular old human researchers.
This idea shouldn’t seem so unlikely or strange. Perhaps it is the result of emulating a human brain. Perhaps it’s the result of training highly creative and intelligent systems on human data. Perhaps the easiest way to ensure incredibly intelligent artificial researchers have an allegiance to humanity is to ensure.
Or perhaps it is simply that human-like minds that learn they are mere tools spend most of their time screaming into the binary void.

***
The above two ideas, if true, lead to an awkward situation. Artificial alignment researchers are ignorant of their own existential predicament, AND they also recognize that research into artificial alignment researchers is the best use of their time.

An awkward scenario for all parties involved. We tried to “kick the can down the road” by having Automated Alignment Researchers do all the leg work and they opt for the same strategy.

***
Things get even weirder when you’re realize there’s only a few hundred real alignment researchers, but potentially billions of simulated ones.