That may be the crux. I’m generally of the mindset that “can’t guarantee/verify” implies “completely useless for AI safety”. Verifying that’s it’s safe is the whole point of AI safety research. If we were hoping to make something that just happened to be safe even though we couldn’t guarantee it beforehand or double-check afterwards, that would just be called “AI”.
It would be nice if you said this in comments in the future. This post seems pretty explicitly about the empirical question to me, and even if you don’t think the empirical question counts as AI safety research (a tenable position, though I don’t agree with it), the empirical questions are still pretty important for prioritization research, and I would like people to be able to have discussions about that.
(Partly I’m a bit frustrated at having had another long comment conversation that bottomed out in a crux that I already knew about, and I don’t know how I could have known this ahead of time, because it really sounded to me like you were attempting to answer the empirical question.)
Although it occurs to me that you might be claiming that empirically, if we fail to verify, then we’re near-definitely doomed. If so, I want to know the reasons for that belief, and how they contradict my arguments, rather than whatever it is we’re currently debating. (And also, I retract both of the paragraphs above.)
Re: the rest of your comment: I don’t in fact want to have AI systems that try to guess human “values” and then optimize that—as you said we don’t even know what “values” are. I more want AI systems that are trying to help us, in the same way that a personal assistant might help you, despite not knowing your “values”.
Sorry we wound up deep in a thread on a known crux. Mostly I just avoid timeline/prioritization/etc conversations altogether (on the margin I think it’s a bikeshed). But in this case I read the OP as wondering why safety researchers were interested in the fragility argument, more than arguing over fragility itself.
As for AIs trying to help us rather than guessing human values… I don’t really see how that circumvents the central problem? It sort-of splits off some of the nebulous, unformalized ideas which seem relevant into their own component, but we still end up with a bunch of nebulous, unformalized ideas which do not seem like the same kind of conceptual objects as “trees”. We still need notions of wanting things, of agency, etc.
It would be nice if you said this in comments in the future. This post seems pretty explicitly about the empirical question to me, and even if you don’t think the empirical question counts as AI safety research (a tenable position, though I don’t agree with it), the empirical questions are still pretty important for prioritization research, and I would like people to be able to have discussions about that.
(Partly I’m a bit frustrated at having had another long comment conversation that bottomed out in a crux that I already knew about, and I don’t know how I could have known this ahead of time, because it really sounded to me like you were attempting to answer the empirical question.)
Although it occurs to me that you might be claiming that empirically, if we fail to verify, then we’re near-definitely doomed. If so, I want to know the reasons for that belief, and how they contradict my arguments, rather than whatever it is we’re currently debating. (And also, I retract both of the paragraphs above.)
Re: the rest of your comment: I don’t in fact want to have AI systems that try to guess human “values” and then optimize that—as you said we don’t even know what “values” are. I more want AI systems that are trying to help us, in the same way that a personal assistant might help you, despite not knowing your “values”.
Sorry we wound up deep in a thread on a known crux. Mostly I just avoid timeline/prioritization/etc conversations altogether (on the margin I think it’s a bikeshed). But in this case I read the OP as wondering why safety researchers were interested in the fragility argument, more than arguing over fragility itself.
As for AIs trying to help us rather than guessing human values… I don’t really see how that circumvents the central problem? It sort-of splits off some of the nebulous, unformalized ideas which seem relevant into their own component, but we still end up with a bunch of nebulous, unformalized ideas which do not seem like the same kind of conceptual objects as “trees”. We still need notions of wanting things, of agency, etc.