Noosphere89 comments on RobertM’s Shortform

Noosphere89 6 Dec 2024 16:57 UTC
2 points
0
To answer these questions:

A world where we’ve reliably “solved” for x-risks well enough to survive thousands of years without also having meaningfully solved “moral philosophy” is probably physically realizable, but this seems like a pretty fine needle to thread from our current position. (I think if you have a plan for solving AI x-risk that looks like “get to ~human-level AI, pump the brakes real hard, and punt on solving ASI alignment” then maybe you disagree.)

I don’t think it takes today-humans a thousand years to come up with a version of indirect normativity (or CEV, or whatever) that actually just works correctly. I’d be somewhat surprised if it took a hundred, but maybe it’s actually very tricky. A thousand just seems crazy. A million makes it sound like you’re doing something very dumb, like figuring out every shard of each human’s values and don’t know how to automate things.

1 possible answer is that something like CEV does not exist, and yet alignment is still solvable anyways for almost arbitrarily capable AI, which could well happen, and for me personally this is honestly the most likely outcome of what happens by default.

There are arguments against the idea that CEV even exists or is well defined that are important to note, and we shouldn’t assume that technological progress equates with progress towards your preferred philosophy:

https://www.lesswrong.com/posts/Y7gtFMi6TwFq5uFHe/some-biases-and-selection-effects-in-ai-risk-discourse#hkoGD6Gwi9YKKZ6S2

https://www.lesswrong.com/posts/SqgRtCwueovvwxpDQ/valence-series-2-valence-and-normativity#2_7_3_Possible_implications_for_AI_alignment_discourse

https://joecarlsmith.com/2021/06/21/on-the-limits-of-idealized-values

And there might not be any real justifiable way to resolve disagreements between the philosophies/moralities, either, if there isn’t a way to converge to a single morality.