sunwillrise comments on Alignment: “Do what I would have wanted you to do”

sunwillrise 13 Jul 2024 16:59 UTC
3 points
2
It’s clear there are a lot of issues with CEV, but I also have no idea what the alternative to something like CEV as a point of comparison is supposed to be.
This reads like an invalid appeal-to-consequences argument. The basic point is that “there are no good alternatives to CEV”, even if true, does not provide meaningful evidence one way or another about whether CEV makes sense conceptually and gives correct and useful intuitions about these issues.
In as much as I am a godshatter of wants, and I want to think about my preferences, I need to somehow come to a conclusion about how to choose between different features
I mean, one possibility (unfortunate and disappointing as it would be if true) is what Wei Dai described 12 years ago:
By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what’s wrong with saying that the solution set may just be null? Given that evolution doesn’t constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must “add up to moral normality”?
To expand a bit more on this possibility, many people have an aversion against moral arbitrariness, so we need at a minimum a utility translation scheme that’s principled enough to pass that filter. But our existing world models are a hodgepodge put together by evolution so there may not be any such sufficiently principled scheme, which (if other approaches to solving moral philosophy also don’t pan out) would leave us with legitimate feelings of “existential angst” and nihilism. One could perhaps still argue that any current such feelings are premature, but maybe some people have stronger intuitions than others that these problems are unsolvable?
So it’s not like CEV is the only logical possibility in front of us, or the only one we have enough evidence to raise to the level of relevant hypothesis. As such, I see this as still being of the appeal-to-consequences form. It might very well be the case that CEV, despite all the challenges and skepticism, nonetheless remains the best or most dignified option to pursue (as a moonshot of sorts), but again, this has no impact on the object-level claims in my earlier comment.
How does “instruction-following AI” have anything to do with this? Like, OK, now you have an AI that in some sense follows your instructions. What are you going to do with it?
I think you’re talking at a completely different level of abstraction and focus than me. I made no statements about the normative desirability of instruction-following AI in my comment on Seth’s post. Instead, I simply claimed, as a positive, descriptive, factual matter, that I was confident value-aligned AGI would not come about (and likely could not come about because of what I thought were serious theoretical problems).
It’s also what seems to me relatively broad consensus on LW that you should not aim for CEV as a first thing to do with an AGI.
I don’t think any relevant part of my comment is contingent on the timing of when you aim for CEV? Whether it’s the first thing you do with an AGI or not.