Thomas Kwa comments on Trying to deconfuse some core AI x-risk problems

Thomas Kwa 18 Oct 2023 2:21 UTC
15 points
3
I don’t think I could name a working method for constructing a safe powerful mind. What I want to say is more like: if you want to deconfuse some core AI x-risk problems, you should deconfuse your basic reasons for worry and core frames first, otherwise you’re building on air.
- Noosphere89 18 Oct 2023 19:13 UTC
  −2 points
  −13
  Parent
  I don’t think I could name a working method for constructing a safe powerful mind.
  
  I could, and my algorithm basically boils down to the following:
  1. Specify a weak/limited prior over goal space, like the genome does.
  2. Create a preference model by using DPO, RLHF or whatever else suits your fancy to guide the intelligence into alignment with x values.
  3. Use the backpropagation algorithm to update the weights of the brain in the optimal direction for alignment.
  4. Repeat until you get low loss, or until you can no longer optimize the function anymore.