That makes perfect sense, thank you. And maybe, if we’ve already got the necessary utility function, stability under self-improvement might be solvable as if it were just a really difficult maths problem. It doesn’t look that difficult to me, a priori, to change your cognitive abilities whilst keeping your goals.
AlphaZero got its giant inscrutable matrices by working from a straightforward start of ‘checkmate is good’. I can imagine something like AlphaZero designing a better AlphaZero (AlphaOne?) and handing over the clean definition of ‘checkmate is good’ and trusting its successor to work out the details better than it could itself.
I get cleverer if I use pencil and paper, it doesn’t seem to redefine what’s good when I do. And no-one stopped liking diamonds when we worked out that carbon atoms weren’t fundamental objects.
---
My point is that the necessary utility function is the hard bit. It doesn’t look anything like a maths problem to me, *and* we can’t sneak up on it iteratively with a great mass of patches until it’s good enough.
We’ve been paying a reasonable amount of attention to ‘what is good?’ for at least two thousand years, and in all that time no-one came up with anything remotely sensible sounding.
I would doubt that the question meant anything, if it were not that I can often say which of two possible scenarios I prefer. And I notice that other people often have the same preference.
I do think that Eliezer thinks that given the Groundhog Day version of the problem, restart every time you do something that doesn’t work out, we’d be able to pull it off.
I doubt that even that’s true. ‘Doesn’t work out’ is too nebulous.
But at this point I guess we’re talking only about Eliezer’s internal thoughts, and I have no insight there. I was attacking a direct quote from the podcast, but maybe I’m misinterpreting something that wasn’t meant to bear much weight.
That makes perfect sense, thank you. And maybe, if we’ve already got the necessary utility function, stability under self-improvement might be solvable as if it were just a really difficult maths problem. It doesn’t look that difficult to me, a priori, to change your cognitive abilities whilst keeping your goals.
AlphaZero got its giant inscrutable matrices by working from a straightforward start of ‘checkmate is good’. I can imagine something like AlphaZero designing a better AlphaZero (AlphaOne?) and handing over the clean definition of ‘checkmate is good’ and trusting its successor to work out the details better than it could itself.
I get cleverer if I use pencil and paper, it doesn’t seem to redefine what’s good when I do. And no-one stopped liking diamonds when we worked out that carbon atoms weren’t fundamental objects.
---
My point is that the necessary utility function is the hard bit. It doesn’t look anything like a maths problem to me, *and* we can’t sneak up on it iteratively with a great mass of patches until it’s good enough.
We’ve been paying a reasonable amount of attention to ‘what is good?’ for at least two thousand years, and in all that time no-one came up with anything remotely sensible sounding.
I would doubt that the question meant anything, if it were not that I can often say which of two possible scenarios I prefer. And I notice that other people often have the same preference.
I do think that Eliezer thinks that given the Groundhog Day version of the problem, restart every time you do something that doesn’t work out, we’d be able to pull it off.
I doubt that even that’s true. ‘Doesn’t work out’ is too nebulous.
But at this point I guess we’re talking only about Eliezer’s internal thoughts, and I have no insight there. I was attacking a direct quote from the podcast, but maybe I’m misinterpreting something that wasn’t meant to bear much weight.