johnlawrenceaspden comments on Eliezer is still ridiculously optimistic about AI risk

johnlawrenceaspden 28 Feb 2023 23:50 UTC
−1 points
−1
That makes perfect sense, thank you. And maybe, if we’ve already got the necessary utility function, stability under self-improvement might be solvable as if it were just a really difficult maths problem. It doesn’t look that difficult to me, a priori, to change your cognitive abilities whilst keeping your goals.
AlphaZero got its giant inscrutable matrices by working from a straightforward start of ‘checkmate is good’. I can imagine something like AlphaZero designing a better AlphaZero (AlphaOne?) and handing over the clean definition of ‘checkmate is good’ and trusting its successor to work out the details better than it could itself.
I get cleverer if I use pencil and paper, it doesn’t seem to redefine what’s good when I do. And no-one stopped liking diamonds when we worked out that carbon atoms weren’t fundamental objects.
---
My point is that the necessary utility function is the hard bit. It doesn’t look anything like a maths problem to me, *and* we can’t sneak up on it iteratively with a great mass of patches until it’s good enough.
We’ve been paying a reasonable amount of attention to ‘what is good?’ for at least two thousand years, and in all that time no-one came up with anything remotely sensible sounding.
I would doubt that the question meant anything, if it were not that I can often say which of two possible scenarios I prefer. And I notice that other people often have the same preference.
I do think that Eliezer thinks that given the Groundhog Day version of the problem, restart every time you do something that doesn’t work out, we’d be able to pull it off.
I doubt that even that’s true. ‘Doesn’t work out’ is too nebulous.
But at this point I guess we’re talking only about Eliezer’s internal thoughts, and I have no insight there. I was attacking a direct quote from the podcast, but maybe I’m misinterpreting something that wasn’t meant to bear much weight.