Paul Crowley

Karma: 15,244

From London, now living in the Santa Cruz mountains.

Paul Crowley Jan 2, 2025, 4:44 AM
5 points
4
in reply to: Martin Randall’s comment on: Magical Categories
Not being able to figure out what sort of thing humans would rate highly isn’t an alignment failure, it’s a capabilities failure, and Eliezer_2008 would never have assumed a capabilities failure in the way you’re saying he would. He is right to say that attempting to directly encode the category boundaries won’t work. It isn’t covered in this blog post, but his main proposal for alignment was always that as far as possible, you want the AI to do the work of using its capabilities to figure out what it means to optimize for human values rather than trying to directly encode those values, precisely so that capabilities can help with alignment. The trouble is that even pointing at this category is difficult—more difficult than pointing at “gets high ratings”.

Paul Crowley Jan 1, 2025, 9:57 PM
11 points
7
in reply to: TurnTrout’s comment on: Magical Categories
In this instance the problem the AI is optimizing for isn’t “maximize smiley faces”, it’s “produce outputs that human raters give high scores to”. And it’s done well on that metric, given that the LLM isn’t powerful enough to subvert the reward channel.

Paul Crowley Mar 19, 2024, 1:09 PM
8 points
6
on: Using axis lines for good or evil
I’m sad that the post doesn’t go on to say how to get matplotlib to do the right thing in each case!

Paul Crowley Feb 11, 2024, 2:07 PM
3 points
0
in reply to: Nathan Helm-Burger’s comment on: Nathan Helm-Burger’s Shortform
I thought you wanted to sign physical things with this? How will you hash them? Otherwise, how is this different from a standard digital signature?

Paul Crowley Feb 9, 2024, 2:40 AM
9 points
0
in reply to: Nathan Helm-Burger’s comment on: Nathan Helm-Burger’s Shortform
The difficult thing is tying the signature to the thing signed. Even if they are single-use, unless the relying party sees everything you ever sign immediately, such a signature can be transferred to something you didn’t sign from something you signed that the relying party didn’t see.

Paul Crowley Dec 20, 2023, 6:42 AM
18 points
13
in reply to: jefftk’s comment on: Effective Aspersions: How the Nonlinear Investigation Went Wrong
Of course this market is “Conditioning on Nonlinear bringing a lawsuit, how likely are they to win?” which is a different question.

Paul Crowley Apr 28, 2023, 5:46 PM
7 points
on: Paul Crowley’s Shortform
Extracted from a Facebook comment:
I don’t think the experts are expert on this question at all. Eliezer’s train of thought essentially started with “Supposing you had a really effective AI, what would follow from that?” His thinking wasn’t at all predicated on any particular way you might build a really effective AI, and knowing a lot about how to build AI isn’t expertise on what the results are when it’s as effective as Eliezer posits. It’s like thinking you shouldn’t have an opinion on whether there will be a nuclear conflict over Kashmir unless you’re a nuclear physicist.

Paul Crowley Apr 17, 2023, 12:51 AM
2 points
0
in reply to: simeon_c’s comment on: Campaign for AI Safety: Please join me
Thanks, that’s useful. Sad to see no Eliezer, no Nate or anyone from MIRI or having a similar perspective though :(

Paul Crowley Apr 8, 2023, 3:36 PM
11 points
3
in reply to: Nik Samoylov’s comment on: Campaign for AI Safety: Please join me
Don’t let your firm opinion get in the way of talking to people before you act. It was Elon’s determination to act before talking to anyone that led to the creation of OpenAI, which seems to have sealed humanity’s fate.

Paul Crowley Mar 2, 2023, 7:06 PM
7 points
5
in reply to: Igor Ivanov’s comment on: Fighting without hope
This is explicitly the discussion the OP asked to avoid.

Paul Crowley Jul 29, 2022, 8:21 PM
2 points
in reply to: gordonst’s comment on: Nonprofit Boards are Weird
This is true whether we adopt my original idea that each board member keeps what they learn from these conversations entirely to themselves, or Ben’s better proposed modification that it’s confidential but can be shared with the whole board.

Paul Crowley Jun 24, 2022, 2:29 AM
10 points
0
on: Nonprofit Boards are Weird
Perhaps this is a bad idea, but it has occurred to me that if I were a board member, I would want to quite frequently have confidential conversations with randomly selected employees.

Paul Crowley May 9, 2022, 7:58 PM
5 points
in reply to: Quintin Pope’s comment on: Is there a convenient way to make “sealed” predictions?
For cryptographic security, I would use HMAC with a random key. Then to reveal, you publish both the message and the key. This eg allows you to securely commit to a one character message like “Y”.

Paul Crowley Jan 23, 2022, 12:01 AM
11 points
in reply to: Shmi’s comment on: A one-question Turing test for GPT-3
I sincerely doubt very many people would propose mayonnaise!

Paul Crowley Dec 28, 2021, 1:00 AM
6 points
in reply to: Gunnar_Zarncke’s comment on: Internet Literacy Atrophy
The idea is that I can do all this from my browser, including writing the code.

Paul Crowley Dec 27, 2021, 11:17 PM
2 points
in reply to: Gunnar_Zarncke’s comment on: Internet Literacy Atrophy
I’m not sure I see how this resembles what I described?

Paul Crowley Dec 26, 2021, 8:16 PM
9 points
in reply to: jbash’s comment on: Internet Literacy Atrophy
I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.