16.Even if you train really hard on an exact loss function, that doesn’t thereby create an explicit internal representation of the loss function inside an AI that then continues to pursue that exact loss function in distribution-shifted environments. Humans don’t explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction. This happens in practice in real life, it is what happened in the only case we know about…
and explained why I didn’t think we should put much weight on the evolution analogy when thinking about AI.
In the 7 months since I made that post, it’s had < 5% of the comments engagement that this post has gotten in a day.
In the 7 months since I made that post, it’s had < 5% of the comments engagement that this post has gotten in a day.
Popular and off-the-cuff presentations often get discussed because it is fun to talk about how the off-the-cuff presentation has various flaws. Most comments get generated by demon threads and scissor statements, sadly. We’ve done some things to combat that, and definitely not all threads with lots of comments are the result of people being slightly triggered and misunderstanding each other, but a quite substantial fraction are.
I actually did exactly this in a previous post, Evolution is a bad analogy for AGI: inner alignment, where I quoted number 16 from A List of Lethalities:
and explained why I didn’t think we should put much weight on the evolution analogy when thinking about AI.
In the 7 months since I made that post, it’s had < 5% of the comments engagement that this post has gotten in a day.
¯\_(ツ)_/¯
Popular and off-the-cuff presentations often get discussed because it is fun to talk about how the off-the-cuff presentation has various flaws. Most comments get generated by demon threads and scissor statements, sadly. We’ve done some things to combat that, and definitely not all threads with lots of comments are the result of people being slightly triggered and misunderstanding each other, but a quite substantial fraction are.
Are this visible at the typical user level?