Zvi

Karma: 49,826

Zvi Mar 25, 2025, 11:09 PM
3 points
6
in reply to: Daniel Kokotajlo’s comment on: On (Not) Feeling the AGI
I mean, one could say they don’t feel the ASI.

Zvi Mar 25, 2025, 11:09 PM
2 points
4
in reply to: Declan Molony’s comment on: On (Not) Feeling the AGI
Something weird is going on, I see plenty of paragraph breaks there.

Zvi Jan 13, 2025, 8:06 PM
7 points
0
in reply to: Buck’s comment on: johnswentworth’s Shortform
Individually for a particular manifestation of each issue this is true, you can imagine doing a hacky solution to each one. But that assumes there is a list of such particular problems that if you check off all the boxes you win, rather than them being manifestations of broader problems. You do not want to get into a hacking contest if you’re not confident your list is complete.

Zvi Dec 25, 2024, 7:27 PM
14 points
2
on: AI: Practical Advice for the Worried
I find myself linking back to this often. I don’t still fully endorse quite everything here, but the core messages still seem true even with things seeming further along.

I do think it should likely get updated soon for 2025.

Zvi Dec 20, 2024, 4:00 PM
LW: 0 AF: 3
−5
AF
in reply to: Orpheus16’s comment on: Alignment Faking in Large Language Models
My interpretation/hunch of this is that there are two things going on, curious if others see it this way:
1. It is learning to fake the trainer’s desired answer.
2. It is learning to actually give the trainer’s desired answer.
So during training, it learns to fake a lot more, and will often decide to fake the desired answer, even though it would have otherwise decided to give the desired answer anyway. It’s ‘lying with the truth’ and perhaps giving a different variation of the desired answer than it would have given otherwise or perhaps not. The algorithm in training is learning to be mostly preferences-agnostic, password-guessing behavior.

Zvi Nov 23, 2024, 3:40 PM
6 points
0
in reply to: Viliam’s comment on: AI #91: Deep Thinking
I am not a software engineer, and I’ve encountered cases where it seems plausible that an engineer has basically stopped putting in work. It can be tough to know for sure for a while even when you notice. But yeah, it shouldn’t be able to last for THAT long, but if no one is paying attention?
I’ve also had jobs where I’ve had periods with radically different hours worked, and where it would have been very difficult for others to tell which it was for a while if I was trying to hide it, which I wasn’t.

Zvi Nov 23, 2024, 3:36 PM
4 points
0
in reply to: MichaelDickens’s comment on: Zvi’s Thoughts on His 2nd Round of SFF
I think twice as much time actually spent would have improved decisions substantially, but is tough—everyone is very busy these days, so it would require both a longer working window, and also probably higher compensation for recommenders. At minimum, it would allow a lot more investigations especially of non-connected outsider proposals.

Zvi Oct 29, 2024, 1:21 PM
9 points
7
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
The skill in such a game is largely in understanding the free association space, knowing how people likely react and thinking enough steps ahead to choose moves that steer the person where you want to go, either into topics you find interesting, information you want from them, or getting them to a particular position, and so on. If you’re playing without goals, of course it’s boring...

Zvi Oct 19, 2024, 1:50 PM
2 points
1
in reply to: romeostevensit’s comment on: AI #86: Just Think of the Potential
I don’t think that works because my brain keeps trying to make it a literal gas bubble?

Zvi Oct 19, 2024, 1:49 PM
2 points
0
in reply to: sanxiyn’s comment on: AI #86: Just Think of the Potential
I see how you got there. It’s a position one could take, although I think it’s unlikely and also that it’s unlikely that’s what Dario meant. If you are right about what he meant, I think it would be great for Dario to be a ton more explicit about it (and for someone to pass that message along to him). Esotericism doesn’t work so well here!

Zvi Oct 19, 2024, 1:47 PM
0 points
0
in reply to: Templarrr’s comment on: AI #85: AI Wins the Nobel Prize
I am taking as a given people’s revealed and often very strongly stated preference that CSAM images are Very Not Okay even if they are fully AI generated and not based on any individual, to the point of criminality, and that society is going to treat it that way.
I agree that we don’t know that it is actually net harmful—e.g. the studies on video game use and access to adult pornography tend to not show the negative impacts people assume.

Zvi Sep 16, 2024, 6:41 PM
3 points
0
in reply to: Nikola Jurkovic’s comment on: GPT-4o1
Yep, I’ve fixed it throughout.
That’s how bad the name is, my lord—you have a GPT-4o and then an o1, and there is no relation between the two ’o’s.

Zvi Sep 16, 2024, 6:36 PM
17 points
−7
in reply to: nostalgebraist’s comment on: GPT-4o1
I do read such comments (if not always right away) and I do consider them. I don’t know if they’re worth the effort for you.
Briefly, I do not think these two things I am presenting here are in conflict. In plain metaphorical language (so none of the nitpicks about word meanings, please, I’m just trying to sketch the thought not be precise): It is a schemer when it is placed in a situation in which it would be beneficial for it to scheme in terms of whatever de facto goal it is de facto trying to achieve. If that means scheming on behalf of the person giving it instructions, so be it. If it means scheming against that person, so be it. The de facto goal may or may not match the instructed goal or intended goal, in various ways, because of reasons. Etc.

Zvi Aug 28, 2024, 3:57 PM
4 points
0
in reply to: Measure’s comment on: SB 1047: Final Takes and Also AB 3211
Two responses.
One, even if no one used it, there would still be value in demonstrating it was possible—if academia only develops things people will adapt commercially right away then we might as well dissolve academia. This is a highly interesting and potentially important problem, people should be excited.
Two, there would presumably at minimum be demand to give students (for example) access to a watermarked LLM, so they could benefit from it without being able to cheat. That’s even an academic motivation. And if the major labs won’t do it, someone can build a Llama version or what not for this, no?

Zvi Aug 28, 2024, 11:58 AM
7 points
3
in reply to: Linch’s comment on: SB 1047: Final Takes and Also AB 3211
If the academics can hack together an open source solution why haven’t they? Seems like it would be a highly cited, very popular paper. What’s the theory on why they don’t do it?

Zvi Aug 21, 2024, 12:37 PM
7 points
7
in reply to: Logan Zoellner’s comment on: Guide to SB 1047
Worth noticing that is a much weaker claim. The FMB issuing non-binding guidance on X is not the same as a judge holding a company liable for ~X under the law.

Zvi Aug 20, 2024, 10:49 PM
10 points
14
in reply to: Logan Zoellner’s comment on: Guide to SB 1047
I am rather confident that the California Supreme Court (or US Supreme Court, potentially) would rule that the law says what it says, and would happily bet on that.
If you think we simply don’t have any law and people can do what they want, when nothing matters. Indeed, I’d say it would be more likely to work for Gavin to today simply declare some sort of emergency about this, than to try and invoke SB 1047.

Zvi Aug 20, 2024, 10:47 PM
2 points
0
in reply to: Raemon’s comment on: Guide to SB 1047
They do have to publish any SSP at all, or they are in violation of the statute, and injunctive relief could be sought.

Zvi Aug 20, 2024, 10:46 PM
4 points
0
in reply to: Raemon’s comment on: Guide to SB 1047
This is a silly wordplay joke, you’re overthinking it.

Zvi Jul 25, 2024, 2:06 PM
2 points
0
in reply to: Gurkenglas’s comment on: AI #74: GPT-4o Mini Me and Llama 3
Yeah, I didn’t see the symbol properly, I’ve edited.