From London, now living in the Santa Cruz mountains.
Paul Crowley
I’m not quite seeing how this negates my point, help me out?
Eliezer sometimes spoke of AIs as if they had “reward channel”
But they don’t, instead they are something a bit like “adaption executors, not fitness maximizers”
This is potentially an interesting misprediction!
Eliezer also said that if you give the AI the goal of maximizing smiley faces, it will make tiny molecular ones
TurnTrout points out that if you ask an LLM if that would be a good thing to do, it says no
My point is that this is exactly what Eliezer would have predicted for an LLM whose reward channel was “maximize reader scores”
Our LLMs tend to produce high reader scores for a reason that’s not exactly “they’re trying to maximize their reward channel”
I don’t at all see how this difference makes a difference! Eliezer would always have predicted that an AI aimed at maximizing reader scores would have produced a response to TurnTrout’s question that maximized reader scores, so it’s silly to present them doing so as a gotcha!
In this instance the problem the AI is optimizing for isn’t “maximize smiley faces”, it’s “produce outputs that human raters give high scores to”. And it’s done well on that metric, given that the LLM isn’t powerful enough to subvert the reward channel.
I’m sad that the post doesn’t go on to say how to get matplotlib to do the right thing in each case!
I thought you wanted to sign physical things with this? How will you hash them? Otherwise, how is this different from a standard digital signature?
The difficult thing is tying the signature to the thing signed. Even if they are single-use, unless the relying party sees everything you ever sign immediately, such a signature can be transferred to something you didn’t sign from something you signed that the relying party didn’t see.
Of course this market is “Conditioning on Nonlinear bringing a lawsuit, how likely are they to win?” which is a different question.
Extracted from a Facebook comment:
I don’t think the experts are expert on this question at all. Eliezer’s train of thought essentially started with “Supposing you had a really effective AI, what would follow from that?” His thinking wasn’t at all predicated on any particular way you might build a really effective AI, and knowing a lot about how to build AI isn’t expertise on what the results are when it’s as effective as Eliezer posits. It’s like thinking you shouldn’t have an opinion on whether there will be a nuclear conflict over Kashmir unless you’re a nuclear physicist.
Thanks, that’s useful. Sad to see no Eliezer, no Nate or anyone from MIRI or having a similar perspective though :(
The lack of names on the website seems very odd.
Don’t let your firm opinion get in the way of talking to people before you act. It was Elon’s determination to act before talking to anyone that led to the creation of OpenAI, which seems to have sealed humanity’s fate.
This is explicitly the discussion the OP asked to avoid.
This is true whether we adopt my original idea that each board member keeps what they learn from these conversations entirely to themselves, or Ben’s better proposed modification that it’s confidential but can be shared with the whole board.
Perhaps this is a bad idea, but it has occurred to me that if I were a board member, I would want to quite frequently have confidential conversations with randomly selected employees.
For cryptographic security, I would use HMAC with a random key. Then to reveal, you publish both the message and the key. This eg allows you to securely commit to a one character message like “Y”.
I sincerely doubt very many people would propose mayonnaise!
The idea is that I can do all this from my browser, including writing the code.
I’m not sure I see how this resembles what I described?
I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.
(I’m considering putting a second cube on top to get five more filters per fan, which would also make it quieter.)
Four more filters per fan, right?
Not being able to figure out what sort of thing humans would rate highly isn’t an alignment failure, it’s a capabilities failure, and Eliezer_2008 would never have assumed a capabilities failure in the way you’re saying he would. He is right to say that attempting to directly encode the category boundaries won’t work. It isn’t covered in this blog post, but his main proposal for alignment was always that as far as possible, you want the AI to do the work of using its capabilities to figure out what it means to optimize for human values rather than trying to directly encode those values, precisely so that capabilities can help with alignment. The trouble is that even pointing at this category is difficult—more difficult than pointing at “gets high ratings”.