Making a thread because it seems related to the above:
Then later, it is smart enough to reflect back on that data and ask: “Were the humans pointing me towards the distinction between goodness and badness, with their training data? Or were they pointing me towards the distinction between that-which-they’d-label-goodness and that-which-they’d-label-badness, with things that look deceptively good (but are actually bad) falling into the former bin?” And to test this hypothesis, it would go back to its training data and find some example bad-but-deceptively-good-looking cases, and see that they were labeled “good”, and roll with that.
Or at least, that’s the sort of thing that happens by default.
I feel like some dynamic similar to this goes on all the time with how people use language, and things work out fine. And deep learning shows that AIs can learn common sense.
I’m reminded of this discussion where I shared the skepticism at the faces example (but also thought it’s possible that I might be missing something).
Making a thread because it seems related to the above:
I feel like some dynamic similar to this goes on all the time with how people use language, and things work out fine. And deep learning shows that AIs can learn common sense.
I’m reminded of this discussion where I shared the skepticism at the faces example (but also thought it’s possible that I might be missing something).