Idk that could be part of finding heuristic arguments for desireable properties for what an UANFSI converges to. Possibly it’s easier to provide probabilistic convergence guarantees for systems that don’t do FSI so this would already give some implicit evidence. But we could also just say that it’s fine if FSI happens as long as we have heuristic convergence arguments—like that UANFSI is just allowing for a broader class of algorithms which might make stuff easier—though i mostly don’t expect we’d get FSI alignment through this indirect alignment path from UANFSI but that we’d get an NFSI AI if we get some probabilistic convergence guarantees.
(Also I didn’t think much about it at all. As said I’m trying KANSI for now.)
I think there are some deeper insights around inner optimization that you are missing that would make you more pessimistic here. “Unknown Algorithm” to me means that we don’t know how to rule out the possibility of inner agents which have opinions about recursive self-improvement. Part of it is that we can’t just think about what it “converges to” (convergence time will be too long for interesting learning systems).
Hm interesting. I mean I’d imagine that if we get good heuristic guarantees for a system it would basically mean that all the not-perfectly-aligned subsystems/subsearches are limited and contained enough that they won’t be able to engage in RSI. But maybe I misunderstand your point? (Like maybe you have specific reason to believe that it would be very hard to predict reliably that a subsystem is contained enough to not engage in RSI or so?)
(I think inner alignment is very hard and humans are currently not (nearly?) competent enough to figure out how to set up training setups within two decades. Like for being able to get good heuristic guarantees I think we’d need to at least figure out at least something sorta like the steering subsystem which tries to align the human brain, only better because it’s not good enough for smart humans I’d say. (Though Steven Byrnes’ agenda is perhaps a UANFSI approach that might have sorta a shot because it might open up possibilities of studying in more detail how values form in humans. Though it’s a central example of what I was imagining when I coined the term.))
How would you become confident that a UANFSI approach was NFSI?
Idk that could be part of finding heuristic arguments for desireable properties for what an UANFSI converges to. Possibly it’s easier to provide probabilistic convergence guarantees for systems that don’t do FSI so this would already give some implicit evidence. But we could also just say that it’s fine if FSI happens as long as we have heuristic convergence arguments—like that UANFSI is just allowing for a broader class of algorithms which might make stuff easier—though i mostly don’t expect we’d get FSI alignment through this indirect alignment path from UANFSI but that we’d get an NFSI AI if we get some probabilistic convergence guarantees.
(Also I didn’t think much about it at all. As said I’m trying KANSI for now.)
I think there are some deeper insights around inner optimization that you are missing that would make you more pessimistic here. “Unknown Algorithm” to me means that we don’t know how to rule out the possibility of inner agents which have opinions about recursive self-improvement. Part of it is that we can’t just think about what it “converges to” (convergence time will be too long for interesting learning systems).
Hm interesting. I mean I’d imagine that if we get good heuristic guarantees for a system it would basically mean that all the not-perfectly-aligned subsystems/subsearches are limited and contained enough that they won’t be able to engage in RSI. But maybe I misunderstand your point? (Like maybe you have specific reason to believe that it would be very hard to predict reliably that a subsystem is contained enough to not engage in RSI or so?)
(I think inner alignment is very hard and humans are currently not (nearly?) competent enough to figure out how to set up training setups within two decades. Like for being able to get good heuristic guarantees I think we’d need to at least figure out at least something sorta like the steering subsystem which tries to align the human brain, only better because it’s not good enough for smart humans I’d say. (Though Steven Byrnes’ agenda is perhaps a UANFSI approach that might have sorta a shot because it might open up possibilities of studying in more detail how values form in humans. Though it’s a central example of what I was imagining when I coined the term.))