Thanks for the comment Erik (and taking the time to read the post).
I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an “optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned”. I see little difference—but I could be persuaded otherwise.
My post was meant to show that it’s pretty easy to find significant holes in some of the most central concepts researched now. This includes eclectic, but also mainstream research including the entire latent-knowledge approach which seems to make significant assumptions about the relationship between human decision making or intent and super-human AGIs. I work a lot on this concept and hold (perhaps too) many opinions.
The tone might not have been ideal due to time limits. Sorry if that was off putting.
I was also trying to make the point that we do not spend enough time shopping our ideas around with especially basic science researchers before we launch our work. I am a bit guilty of this. And I worry a lot that I’m actually contributing to capabilities research rather than long-term AI-safety. I guess in the end I hope for a way for AI-safety and science researchers to interact more easily and develop ideas together.
Thanks for the comment Erik (and taking the time to read the post).
I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an “optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned”. I see little difference—but I could be persuaded otherwise.
My post was meant to show that it’s pretty easy to find significant holes in some of the most central concepts researched now. This includes eclectic, but also mainstream research including the entire latent-knowledge approach which seems to make significant assumptions about the relationship between human decision making or intent and super-human AGIs. I work a lot on this concept and hold (perhaps too) many opinions.
The tone might not have been ideal due to time limits. Sorry if that was off putting.
I was also trying to make the point that we do not spend enough time shopping our ideas around with especially basic science researchers before we launch our work. I am a bit guilty of this. And I worry a lot that I’m actually contributing to capabilities research rather than long-term AI-safety. I guess in the end I hope for a way for AI-safety and science researchers to interact more easily and develop ideas together.