Opinions expressed are my own and not endorsed by anyone.
Please excuse my poor reading comprehension
Formerly @ ARC Evals aka METR
Opinions expressed are my own and not endorsed by anyone.
Please excuse my poor reading comprehension
Formerly @ ARC Evals aka METR
Who is the new charismatic leader of prediction markets?
Hmm I was mainly thinking of the”redistribute sex” phrasing fiasco, slatestarcodex being contra hanson on healthcare, tyler cowen being contra hanson on the self evaluated property tax, and the brutal quote tweets. But maybe these are in fact symptoms of success and I have it partially backwards… Hmm
This is new info to me, thanks
Some Anthropic statements have suggested that sharing is hard in general.
If they said that then they are speaking nonsense IMO. Once you have your stuff set up it’s a button you click. You have to trust that the evaluator won’t leak info or soil your reputation without good cause though.
Example #999 that I cannot read
Be allowed? You’re not allowed?
Anthropic said that collaborating with METR “requir[ed] significant science and engineering support on our end”; it has not clarified why.
I can comment on this (I think without breaking NDA). I will oversimplify. They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This was and is my proposal for evaluator model access: If at least 10 people at a lab can access a model then at least 1 person at METR must have access.
This is for the labs self-enforcing via public agreements.
This seems like something they would actually agree to.
If it were a law then you would replace METR with “a govt approved auditor”.
I think conformance could be greatly improved by getting labs to use a little login widget (could be CLI) which allows eg METR to see access permission changes (possibly with codenames for models andor people). Ideally this would be very little effort for labs and sidestepping it would be more effort once it was set up.
Feedback welcome.
External red-teaming is not external model evaluation. External red-teaming … several people …. ~10 hours each. External model evals … experts … evals suites … ~10,000 hours developing.
Yes there is some awkwardness here… Red teaming could be extremely effective if structured as an open competition. Possibly more effective than orgs like METR. The problem is that this trains up tons of devs on Doing Evil With AI and probably also produces lots of really useful github repos. So I agree with you.
What am I missing?
His sister’s accusations that he blocked her from parent’s inheritance and that he molested her when he was a young teenager and that he got her social media accounts flagged as spam to hide the accusations
What do you mean by “following through”? Just sending another email?
(My track record of 0% accuracy on which messages will politically snowball is holding up very well. I’m glad that sometimes people like you say things the way you say them, rather than only people like me saying things how I say them.)
I wonder if a chat loop like this would be effective at shortcutting years of confused effort maybe in research andor engineering. (The AI just asks the questions and the person answers.)
“what are you seeking?”
“ok how will you do it?”
“think of five different ways to do that”
“describe a consistent picture of the consequences of that”
“how could you do that in a day instead of a year”
“give me five very different alternate theories of how the underlying system works”
Questions like that can be surprisingly easy to answer. Just hard to remember to ask.
I would add one. I haven’t found a compelling thing to aim for long term. I have asked many people to describe a coherent positive future involving AI. I have heard no good answers. I have been unable to produce one myself.
Are we playing a game that has no happy endings? I hope we are not.
The acceptable tone of voice here feels like 3mm wide to me. I’m always having bad manners
I swear to never joke again sir
I assumed somebody had. Maybe everyone did haha
#onlyReadBadWriters #hansonFTW
From the frontpage:
https://www.lesswrong.com/posts/zAqqeXcau9y2yiJdi/can-we-build-a-better-public-doublecrux
https://www.lesswrong.com/posts/bkr9BozFuh7ytiwbK/my-hour-of-memoryless-lucidity
https://www.lesswrong.com/posts/ANGmJnZL2fskHX6tj/dyslucksia
Like all of them basically.
most of the value is in even figuring out how to diagram the posts
Think of it like a TLDR. There are many ways to TLDR but any method that’s not terrible is fantastic
The job would of course be done by a diagramming god, not a wordpleb like me
If i got double dog dared...
“Lo-salt” salt is salt with potassium. That’s been my table salt for 5 years.
I was working on this cute math notation the other day. Curious if anybody knows a better way or if I am overcomplicating this.
Say you have z:=c∗x2∗y. And you want m:=dz/dx=2∗c∗x∗y to be some particular value.
Sometimes you can control x, sometimes you can control y, and you can always easily measure z. So you might use these forms of the equation:
m=2∗c∗x∗y=2∗z/x=2∗√c∗y∗z
It’s kind of confusing that m seems proportional to both z and √z. So here’s where the notation comes in. Can write above like
m(x=x,y=y,z=?)=2∗c∗x∗y m(x=x,y=?,z=z)=2∗z/x m(x=?,y=y,z=z)=2∗√c∗y∗z
Which seems a lot clearer to me.
And you could shorten it to m(x,y,?), m(x,?,z), and m(?,y,z).