Zvi

Karma: 40,630

Zvi 23 Apr 2024 19:28 UTC
3 points
0
in reply to: Chris_Leong’s comment on: On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
It is better than nothing I suppose but if they are keeping the safeties and restrictions on then it will not teach you whether it is fine to open it up.

Zvi 11 Apr 2024 20:26 UTC
8 points
4
in reply to: Soapspud’s comment on: RTFB: On the New Proposed CAIP AI Bill
My guess is that different people do it differently, and I am super weird.
For me a lot of the trick is consciously asking if I am providing good incentives, and remembering to consider what the alternative world looks like.

Zvi 11 Apr 2024 14:15 UTC
7 points
13
in reply to: trevor’s comment on: RTFB: On the New Proposed CAIP AI Bill
I don’t see this response as harsh at all? I see it as engaging in detail with the substance, note the bill is highly thoughtful overall, with a bunch of explicit encouragement, defend a bunch of their specific choices, and I say I am very happy they offered this bill. It seems good and constructive to note where I think they are asking for too much? While noting that the right amount of ‘any given person reacting thinks you went too far in some places’ is definitely not zero.

Zvi 23 Mar 2024 15:45 UTC
5 points
0
in reply to: Edouard Harris’s comment on: On the Gladstone Report
Excellent. On the thresholds, got it, sad that I didn’t realize this, and that others didn’t either from what I saw.
I appreciate the ‘long post is long’ problem but I do think you need the warnings to be in all the places someone might see the 10^X numbers in isolation, if you don’t want this to happen, and it probably happens anyway, on the grounds of ‘yes that was technically not a proposal but of course it will be treated like one.’ And there’s some truth in that, and that you want to use examples that are what you would actually pick right now if you had to pick what to actually do (or propose).
I do think the numbers I suggest are about as low as one could realistically get until we get much stronger evidence of impending big problems.

Zvi 22 Mar 2024 12:23 UTC
6 points
0
on: What information do you share with whom?
Secrecy is the exception. Mostly no one cares about your startup idea or will remember your hazardous brainstorm, no one is going to cause you trouble, and so on, and honesty is almost always the best policy.
That doesn’t mean always tell everyone everything, but you need to know what you are worried about if you are letting this block you.
On infohazards, I think people were far too worried for far too long. The actual dangerous idea turned out to be that AGI was a dangerous idea, not any specific thing. There are exceptions, but you need a very good reason, and an even better reason if it is an individual you are talking with.
Trust in terms of ‘they won’t steal from me’ or ‘they will do what they promise’ is another question with no easy answers.
If you are planning something radical enough to actually get people’s attention (e.g. breaking laws, using violence, fraud of various kinds, etc) then you would want to be a lot more careful who you tell, but also—don’t do that?

Zvi 22 Mar 2024 12:23 UTC
0 points
0
on: What information do you share with whom?
Secrecy is the exception. Mostly no one cares about your startup idea or will remember your hazardous brainstorm, no one is going to cause you trouble, and so on, and honesty is almost always the best policy.
That doesn’t mean always tell everyone everything, but you need to know what you are worried about if you are letting this block you.
On infohazards, I think people were far too worried for far too long. The actual dangerous idea turned out to be that AGI was a dangerous idea, not any specific thing. There are exceptions, but you need a very good reason, and an even better reason if it is an individual you are talking with.
Trust in terms of ‘they won’t steal from me’ or ‘they will do what they promise’ is another question with no easy answers.
If you are planning something radical enough to actually get people’s attention (e.g. breaking laws, using violence, fraud of various kinds, etc) then you would want to be a lot more careful who you tell, but also—don’t do that?

Zvi 30 Jan 2024 23:37 UTC
2 points
0
in reply to: Ben Pace’s comment on: Monthly Roundup #14: January 2024
Sounds like your scale is stingier than mine is a lot of it. And it makes sense that the recommendations come apart at the extreme high end, especially for older films. The ‘for the time’ here is telling.

Zvi 30 Jan 2024 0:22 UTC
2 points
0
in reply to: Ben Pace’s comment on: Monthly Roundup #14: January 2024
On my scale, if I went 1 for 7 on finding 4.0+ films in a year, then yeah I’d find that a disappointing year.
In other news, I tried out Scaruffi. I figured I’d watch the top pick. Number was Citizen Kane which I’d already watched (5.0 so that was a good sign), which was Repulsion. And… yeah, that was not a good selection method. Critics and I do NOT see eye to eye.
I also scanned their ratings of various other films, which generally seemed reasonable for films I’d seen, although with a very clear ‘look at me I am a movie critic’ bias, including one towards older films. I don’t know how to correct for that properly.

Zvi 26 Jan 2024 15:25 UTC
2 points
0
in reply to: Jiao Bu’s comment on: Monthly Roundup #14: January 2024
Real estate can definitely be a special case, because (1) you are also doing consumption, (2) it is non-recourse and you never get a margin call, which provides a lot of protection and (3) The USG is massively subsidizing you doing that...

Zvi 26 Jan 2024 15:24 UTC
1 point
−1
in reply to: Templarrr’s comment on: Monthly Roundup #14: January 2024
There are lead times to a lot of these actions, costs to do so are often fixed, and no reason to expect the rules changes not to happen. I buy that it is efficient to do so early.
‘Greed’ I consider a non-sequitur here, the manager will profit maximize.

Zvi 26 Jan 2024 15:19 UTC
2 points
0
in reply to: Ben Pace’s comment on: Monthly Roundup #14: January 2024
I’m curious how many films you saw—having only one above 3.5 on that scale seems highly disappointing.

Zvi 23 Jan 2024 12:50 UTC
2 points
0
in reply to: Logan Zoellner’s comment on: AI #48: Exponentials in Geometry
Argument from incredulity?

Zvi 17 Jan 2024 20:30 UTC
2 points
0
in reply to: evhub’s comment on: On Anthropic’s Sleeper Agents Paper
Thanks for the notes!
As I understand that last point, you’re saying that it’s not a good point because it is false (hence my ‘if it turns out to be true’). Weird that I’ve heard the claim from multiple places in these discussions. I assumed there was some sort of ‘order matters in terms of pre-training vs. fine-tuning obviously, but there’s a phase shift in what you’re doing between them.’ I also did wonder about the whole ‘you can remove Llama-2’s fine tuning in 100 steps’ thing, since if that is true then presumably order must matter within fine tuning.
Anyone think there’s any reason to think Pope isn’t simply technically wrong here (including Pope)?

Zvi 17 Jan 2024 15:55 UTC
2 points
0
in reply to: tgb’s comment on: Medical Roundup #1
Yep, whoops, fixing.

Zvi 16 Jan 2024 14:46 UTC
LW: 8 AF: 3
2
AF
in reply to: ryan_greenblatt’s comment on: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
That seems rather loaded in the other direction. How about “The evidence suggests that if current ML systems were going to deceive us in scenarios that do not appear in our training sets, we wouldn’t be able to detect this or change them not to unless we found the conditions where it would happen.”?

Zvi 15 Jan 2024 0:41 UTC
2 points
0
in reply to: Raemon’s comment on: Announcing Balsa Research
Did you see (https://thezvi.substack.com/p/balsa-update-and-general-thank-you)? That’s the closest thing available at the moment.

Zvi 9 Jan 2024 14:21 UTC
10 points
9
on: Criticism of EA Criticism Contest
This post was, in the end, largely a failed experiment. It did win a lesser prize, and in a sense that proved its point, and I had fun doing it, but I do not think it successfully changed minds, and I don’t think it has lasting value, although someone gave it a +9 so it presumably worked for them. The core idea—that EA in particular wants ‘criticism’ but it wants it in narrow friendly ways and it discourages actual substantive challenges to its core stuff—does seem important. But also this is LW, not EA Forum. If I had to do it over again, I wouldn’t bother writing this.

Zvi 9 Jan 2024 14:16 UTC
7 points
17
on: Announcing Balsa Research
I am flattered that someone nominated this but I don’t know why. I still believe in the project, but this doesn’t match at all what I’d look to in this kind of review? The vision has changed and narrowed substantially. So this is a historical artifact of sorts, I suppose, but I don’t see why it would belong.

Zvi 9 Jan 2024 14:15 UTC
3 points
7
on: Jailbreaking ChatGPT on Release Day
I think this post did good work in its moment, but doesn’t have that much lasting relevance and can’t see why someone would revisit at this point. It shouldn’t be going into any timeless best-of lists.

Zvi 9 Jan 2024 14:13 UTC
4 points
on: On Bounded Distrust
I continue to frequently refer back to my functional understanding of bounded distrust. I now try to link to ‘How To Bounded DIstrust’ instead because it’s more compact, but this is I think the better full treatment for those who have the time. I’m sad this isn’t seeing more support, presumably because it isn’t centrally LW-focused enough? But to me this is a core rationalist skill not discussed enough, among its other features.