Noosphere89 comments on Trying to deconfuse some core AI x-risk problems

Noosphere89 18 Oct 2023 19:13 UTC
−2 points
−13
I don’t think I could name a working method for constructing a safe powerful mind.

I could, and my algorithm basically boils down to the following:
1. Specify a weak/limited prior over goal space, like the genome does.
2. Create a preference model by using DPO, RLHF or whatever else suits your fancy to guide the intelligence into alignment with x values.
3. Use the backpropagation algorithm to update the weights of the brain in the optimal direction for alignment.
4. Repeat until you get low loss, or until you can no longer optimize the function anymore.