Garrett Baker comments on Daniel Kokotajlo’s Shortform

Garrett Baker 1 Oct 2024 22:45 UTC
18 points
−1
I’m definitely more happy. I never had all that much faith in governance (except for a few months approximately a year ago), and in particular I expect it would fall on its face even in the “alien tiger” world. Though governance in the “alien tiger” world is definitely easier, p(alignment) deltas are the dominant factor.

Nevermind scalable oversight, the ability to load in natural abstractions before you load in a utility function seems very important for most of the pieces of hope I have. Both in alignment by default worlds, and in more complicated alignment by human-designed clever training/post-training mechanisms words.

In the RL world I think my only hope would be in infra-bayesianism physicalism. [edit: oh yeah, and some shard theory stuff, but that mostly stays constant between the two worlds].

Though of course in the RL world maybe there’d be more effort spent on coming up with agency-then-knowledge alignment schemes.