It’s looking like the values of humans are far, far simpler than a lot of evopsych literature and Yudkowsky.
I’ve missed this. Any particular link to get to me started reading about this update? Shard theory seems to imply complex values in individual humans. Though certainly less fragile than Yudkowsky proposed.
I’ve missed this. Any particular link to get to me started reading about this update? Shard theory seems to imply complex values in individual humans. Though certainly less fragile than Yudkowsky proposed.
Note, this is outside of Shard Theory’s scope, and I wasn’t appealing to shard theory here.
So the links that I personally viewed to make these updates are here:
This summary of Matthew Barnett’s post:
https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument#N9ManBfJ7ahhnqmu7
And 2 links from Beren about alignment:
https://www.beren.io/2024-05-11-Alignment-in-the-Age-of-Synthetic-Data/
https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/