lukehmiles

Karma: 787

Opinions expressed are my own and not endorsed by anyone.

Please excuse my poor reading comprehension

Formerly @ ARC Evals aka METR

lukehmiles 27 May 2024 16:11 UTC
1 point
0
on: lcmgcd’s Shortform
I was working on this cute math notation the other day. Curious if anybody knows a better way or if I am overcomplicating this.

Say you have $z := c * x^{2} * y$ . And you want $m := d z / d x = 2 * c * x * y$ to be some particular value.

Sometimes you can control $x$ , sometimes you can control $y$ , and you can always easily measure $z$ . So you might use these forms of the equation:

$m = 2 * c * x * y = 2 * z / x = 2 * \sqrt{c * y * z}$

It’s kind of confusing that $m$ seems proportional to both $z$ and $\sqrt{z}$ . So here’s where the notation comes in. Can write above like

$m (x = x, y = y, z = ?) = 2 * c * x * y$ $m (x = x, y = ?, z = z) = 2 * z / x$ $m (x = ?, y = y, z = z) = 2 * \sqrt{c * y * z}$

Which seems a lot clearer to me.

And you could shorten it to $m (x, y, ?)$ , $m (x, ?, z)$ , and $m (?, y, z)$ .

lukehmiles 27 May 2024 3:44 UTC
1 point
0
in reply to: Joel Burget’s comment on: If you are also the worst at politics
Who is the new charismatic leader of prediction markets?

lukehmiles 27 May 2024 3:43 UTC
3 points
0
in reply to: Joel Burget’s comment on: If you are also the worst at politics
Hmm I was mainly thinking of the”redistribute sex” phrasing fiasco, slatestarcodex being contra hanson on healthcare, tyler cowen being contra hanson on the self evaluated property tax, and the brutal quote tweets. But maybe these are in fact symptoms of success and I have it partially backwards… Hmm

lukehmiles 27 May 2024 3:33 UTC
1 point
0
in reply to: Raemon’s comment on: If you are also the worst at politics
This is new info to me, thanks

lukehmiles 26 May 2024 19:37 UTC
10 points
5
in reply to: Zach Stein-Perlman’s comment on: AI companies aren’t really using external evaluators

Some Anthropic statements have suggested that sharing is hard in general.

If they said that then they are speaking nonsense IMO. Once you have your stuff set up it’s a button you click. You have to trust that the evaluator won’t leak info or soil your reputation without good cause though.

lukehmiles 26 May 2024 19:10 UTC
1 point
0
in reply to: Zach Stein-Perlman’s comment on: AI companies aren’t really using external evaluators
Example #999 that I cannot read

lukehmiles 26 May 2024 18:52 UTC
1 point
0
in reply to: Nathan Helm-Burger’s comment on: AI companies aren’t really using external evaluators
Be allowed? You’re not allowed?

lukehmiles 26 May 2024 18:46 UTC
42 points
2
on: AI companies aren’t really using external evaluators
Anthropic said that collaborating with METR “requir[ed] significant science and engineering support on our end”; it has not clarified why.

I can comment on this (I think without breaking NDA). I will oversimplify. They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This was and is my proposal for evaluator model access: If at least 10 people at a lab can access a model then at least 1 person at METR must have access.

This is for the labs self-enforcing via public agreements.

This seems like something they would actually agree to.

If it were a law then you would replace METR with “a govt approved auditor”.

I think conformance could be greatly improved by getting labs to use a little login widget (could be CLI) which allows eg METR to see access permission changes (possibly with codenames for models andor people). Ideally this would be very little effort for labs and sidestepping it would be more effort once it was set up.

Feedback welcome.
External red-teaming is not external model evaluation. External red-teaming … several people …. ~10 hours each. External model evals … experts … evals suites … ~10,000 hours developing.

Yes there is some awkwardness here… Red teaming could be extremely effective if structured as an open competition. Possibly more effective than orgs like METR. The problem is that this trains up tons of devs on Doing Evil With AI and probably also produces lots of really useful github repos. So I agree with you.

lukehmiles 24 May 2024 9:21 UTC
1 point
0
in reply to: Zach Stein-Perlman’s comment on: jacquesthibs’s Shortform

What am I missing?

His sister’s accusations that he blocked her from parent’s inheritance and that he molested her when he was a young teenager and that he got her social media accounts flagged as spam to hide the accusations

lukehmiles 24 May 2024 9:16 UTC
2 points
0
in reply to: Haiku’s comment on: robo’s Shortform
What do you mean by “following through”? Just sending another email?

lukehmiles 24 May 2024 8:46 UTC
3 points
0
in reply to: simeon_c’s comment on: simeon_c’s Shortform
(My track record of 0% accuracy on which messages will politically snowball is holding up very well. I’m glad that sometimes people like you say things the way you say them, rather than only people like me saying things how I say them.)

lukehmiles 24 May 2024 8:41 UTC
3 points
0
on: lcmgcd’s Shortform
I wonder if a chat loop like this would be effective at shortcutting years of confused effort maybe in research andor engineering. (The AI just asks the questions and the person answers.)
- “what are you seeking?”
- “ok how will you do it?”
- “think of five different ways to do that”
- “describe a consistent picture of the consequences of that”
- “how could you do that in a day instead of a year”
- “give me five very different alternate theories of how the underlying system works”
Questions like that can be surprisingly easy to answer. Just hard to remember to ask.

lukehmiles 24 May 2024 7:30 UTC
2 points
0
on: What mistakes has the AI safety movement made?
I would add one. I haven’t found a compelling thing to aim for long term. I have asked many people to describe a coherent positive future involving AI. I have heard no good answers. I have been unable to produce one myself.

Are we playing a game that has no happy endings? I hope we are not.

lukehmiles 14 May 2024 5:41 UTC
2 points
1
on: lcmgcd’s Shortform
The acceptable tone of voice here feels like 3mm wide to me. I’m always having bad manners

lukehmiles 14 May 2024 5:38 UTC
2 points
0
in reply to: Garrett Baker’s comment on: D0TheMath’s Shortform
I swear to never joke again sir

lukehmiles 12 May 2024 9:22 UTC
0 points
0
in reply to: denkenberger’s comment on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
I assumed somebody had. Maybe everyone did haha

lukehmiles 12 May 2024 9:21 UTC
−9 points
0
in reply to: Garrett Baker’s comment on: D0TheMath’s Shortform
#onlyReadBadWriters #hansonFTW

lukehmiles 12 May 2024 9:14 UTC
1 point
0
in reply to: the gears to ascension’s comment on: lcmgcd’s Shortform
From the frontpage:

https://www.lesswrong.com/posts/zAqqeXcau9y2yiJdi/can-we-build-a-better-public-doublecrux

https://www.lesswrong.com/posts/bkr9BozFuh7ytiwbK/my-hour-of-memoryless-lucidity

https://www.lesswrong.com/posts/Lgq2DcuahKmLktDvC/applying-refusal-vector-ablation-to-a-llama-3-70b-agent

https://www.lesswrong.com/posts/ANGmJnZL2fskHX6tj/dyslucksia

https://www.lesswrong.com/posts/BRZf42vpFcHtSTraD/linkpost-towards-a-theoretical-understanding-of-the-reversal

Like all of them basically.

most of the value is in even figuring out how to diagram the posts

Think of it like a TLDR. There are many ways to TLDR but any method that’s not terrible is fantastic

lukehmiles 12 May 2024 9:04 UTC
1 point
−3
in reply to: the gears to ascension’s comment on: lcmgcd’s Shortform
The job would of course be done by a diagramming god, not a wordpleb like me

If i got double dog dared...

lukehmiles 12 May 2024 8:59 UTC
1 point
0
in reply to: ChristianKl’s comment on: ChristianKl’s Shortform
“Lo-salt” salt is salt with potassium. That’s been my table salt for 5 years.