So, apparently, I’m stupid. I could have been making money this whole time, but I was scared to ask for it
i’ve been giving a bunch of people and businesses advice on how to do their research and stuff. one of them messaged me, i was feeling tired and had so many other things to do. said my time is busy.
then thought fuck it, said if they’re ok with a $15 an hour consulting fee, we can have a call. baffled, they said yes.
then realized, oh wait, i have multiple years of experience now leading dev teams, ai research teams, organizing research hackathons and getting frontier research done.
Yeah, a friend told me this was low—I’m just scared of asking for money rn I guess.
I do see people who seem very incompetent getting paid as consultants, so I guess I can charge for more. I’ll see how much my time gets eaten by this and how much money I need. I want to buy some gpus, hopefully this can help.
I’m not trying to be derisive; in fact, I relate to you greatly. But it’s by being on the outside that I’m able to levy a few more direct criticisms:
Were you not paid for the other work that you did, leading dev teams and getting frontier research done? Those things should be a baseline on the worth of your time.
If that, have you ever tried to maximize the amount of money you can get the) other people to acknowledge your time as worth (ie, get a high salary offer)?
Separately, do you know the going rate for consultants with approximately your expertise? Or any other reference class you cna make up. Consulting can cost an incredible amount of money, and that price can be “fair” in a pretty simple sense if it averts the need to do 10s of hours of labor at high wages. It may be one of the highest leverage activities per unit time that exists as a conventional economic activity that a person can simply do.
Aside from market rates or whatever, I suggest you just try asking for unreasonable things, or more money than you feel you’re worth (think of it as an experiment, and maybe observe what happens in your mind when you flinch from this).
Do you have any emotional hangup about the prospect of trading money for labor generally, or money for anything?
Separately, do you have a hard time asserting your worth to others (or maybe just strangers) on some baseline level?
Were you not paid for the other work that you did, leading dev teams and getting frontier research done? Those things should be a baseline on the worth of your time.
This was running AI Plans, my startup, so makes sense that I wasn’t getting paid, since the same hesitancy for asking for money leads to hesitancy to do that exaggeration thing many AI Safety/EA people seem to do when making funding applications. Also, I don’t like to make the funding applications, or long applications in general.
If that, have you ever tried to maximize the amount of money you can get the) other people to acknowledge your time as worth (ie, get a high salary offer)?
I think every time I’ve asked for money, I’ve tried to ask for the lowest amount I can.
Separately, do you know the going rate for consultants with approximately your expertise? Or any other reference class you cna make up. Consulting can cost an incredible amount of money, and that price can be “fair” in a pretty simple sense if it averts the need to do 10s of hours of labor at high wages. It may be one of the highest leverage activities per unit time that exists as a conventional economic activity that a person can simply do.
I don’t know—I have a doc of stuff I’ve done that I paste into LLMs when I need to make a funding applications and stuff—just pasted it into Gemini 2.5 Pro and asked what would be a reasonable hourly fee and it said $200 to $400 an hour.
Aside from market rates or whatever, I suggest you just try asking for unreasonable things, or more money than you feel you’re worth (think of it as an experiment, and maybe observe what happens in your mind when you flinch from this).
I’ll give it a go—I’ve currently put the asking price on my call link for $50 an hour, feel nervous about actually asking for that though. I need to make a funding application for AI Plans—I can ask for money on behalf of others on the team, but asking for money to be donated so I can get a high salary feels scary. Happy to ask for a high salary for others on the team though, since I want them to get paid what they need.
Do you have any emotional hangup about the prospect of trading money for labor generally, or money for anything?
Yeah, I do. Generally, I’m used to doing a lot of free work for family and getting admonished when I ask for money. And when I did get promised money, it was either wayyy below market price or wayy late or didn’t get paid at all. General experience with family was my work not being valued even when I put in extra effort. I’m aware that’s wrong and has taught me wrong lessons, but not fully learnt the true ones yet.
I do think that $200-$400 seem like reasonable consulting rates.
I think the situations with family are complicated, because sure, there are social/cultural reasons one might be expected to do those things for family. Usually people hold those cultural norms alongside a stronger distinction between the ingroup (family) and the outgroup (all other people by default), though, so letting your impressions from that culture teach you things about how to behave in a culture with a weaker distinction might be maladaptive.
(I actually was suggesting you try asking for objectively completely unreasonable things just to look at the flinch. For example, you could ask a stranger for $100 for no reason. They would say no, but no harm would be done.)
One frame that might be useful to you is that in a way, it is imperative to at least sufficiently assert your value to others (if not overassert it the socially expected amount). An overly modest estimate is still a miscalibrated one, and people will make suboptimal decisions as a result. (Putting aside the behavior and surpluses given to other people, you are also a player in this game, and your being underallocated resources is globally suboptimal.)
Ah, I can totally relate to this. Whenever I think about asking for money, the Impostor Syndrome gets extra strong. Meanwhile, there are actual impostors out there collecting tons of money without any shame. (Though they may have better social skills, which is probably the category of skill that ultimately gets paid best.)
Another important lesson I got once, which might be useful for you at some moment: “If you double your prices, and lose half of your customers as a result, you will still get the same amount of money, but only work half as much.”
Also, speaking from my personal experience, the relation between how much / how difficult work someone wants you to do, and how much they are willing to pay you, seems completely random. One might naively expect that a job that pays more will be more difficult, but often it is the other way round.
Update—consulting went well. He said he was happy with it and got a lot of useful stuff. I was upfront with the fact that I just made up the $15 an hour and might change it, asked him what he’d be happy with, he said it’s up to me, but didn’t seem bothered at all at the price potentially changing.
I was upfront about the stuff I didn’t know and was kinda surprised at how much I was able to contribute, even knowing that I underestimate my technical knowledge because I barely know how to code.
I currently think we’re mostly interested in properties that apply at all timesteps, or at least “quickly”, as well as in the limit; rather than only in the limit. I also think it may be easier to get a limit at all by first showing quickness, in this case, but not at all sure of that.
The actual hard parts? Math probably doesn’t help much directly, unfortunately. Mathematical thinking is good. You’ll have to learn how to think in novel ways, so there’s not even a vector anyone can point you in, except for pointers with a whole lot of “dereference not included” like “figure out how to understand the fundamental forces involved in what actually determines what a mind ends up trying to do long term” (https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html).
This seems generally applicable. Any significant money transaction includes expectations, both legible and il-, which some participants will classify as bullshit. Those holding the expectations may believe it to be legitimately useful, or semi-legitimately necessary due to lack of perfect alignment.
If you want to specify a bit, we can probably guess at why it’s being required.
What I liked about applying for VC funding was the specific questions.
“How is this going to make money?”
“What proof do you have this is going to make money”
and it being clear the bullshit that they wanted was numbers, testimonials from paying customers, unambiguous ways the product was actually better, etc. And then standard bs about progress, security, avoiding weird wibbly wobbly talk, ‘woke’, ‘safety’, etc.
With Alignment funders, they really obviously have language they’re looking for as well, or language that makes them more and less willing to put more effort into understanding the proposal. Actually, they have it more than the VCs. But they act as if they don’t.
Have you felt this from your own experience trying to get funding, or from others, or both? Also, I’m curious what you think is their specific kind of bullshit, and if there’s things you think are real but others thought to be bullshit.
Way to go! :D. The important thing is that you’ve realized it. If you naturally already get those enquiries, you’re halfway there: people already know you and reach out to you without you having to promote your expertise. Best of luck!
:) the real money was the friends we made along the way.
I dropped out of a math MSc. at a top university in order to spend time learning about AI safety. I haven’t made a single dollar and now I’m working as a part time cashier, but that’s okay.
What use is money if you end up getting turned into paperclips?
PS: do you want to sign my open letter asking for more alignment funding?
For AI Safety funders/regranters—e.g. Open Phil, Manifund, etc:
It seems like a lot of the grants are swayed by ‘big names’ being on there. I suggest making anonymity compulsary if you want to more merit based funding, that explores wider possibilities and invests in more upcoming things.
Treat it like a Science rather than the Bragging Competition it currently is.
A Bias Pattern atm seems to be that the same people get funding, or recommended funding by the same people, leading to the number of innovators being very small, or growing much more slowly than if the process was anonymised.
Also, ask people seeking funding to make specific, unambiguous, easily falsiable predictions of positive outcomes from their work. And track and follow up on this!
It’s an interesting idea, but the track records of the grantees are important information, right? And if the track record includes, say, a previous paper that the funder has already read, then you can’t submit the paper with author names redacted.
Also, ask people seeking funding to make specific, unambiguous, easily falsiable predictions of positive outcomes from their work. And track and follow up on this!
Wouldn’t it be better for the funder to just say “if I’m going to fund Group X for Y months / years of work, I should see what X actually accomplished in the last Y months / years, and assume it will be vaguely similar”? And if Group X has no comparable past experience, then fine, but that equally means that you have no basis for believing their predictions right now.
Also, what if someone predicts that they’ll do A, but then realizes it would be better if they did B? Two possibilities are: (1) You the funder trust their judgment. Then you shouldn’t be putting even minor mental barriers in the way of their pivoting. Pivoting is hard and very good and important! (2) You the funder don’t particular trust the recipient’s judgment, you were only funding it because you wanted that specific deliverable. But then the normal procedure is that the funder and recipient work together to determine the deliverables that the funder wants and that the recipient is able to provide. Like, if I’m funding someone to build a database of AI safety papers, then I wouldn’t ask them to “make falsifiable predictions about the outcomes from their work”, instead I would negotiate a contract with them that says they’re gonna build the database. Right? I mean, I guess you could call that a falsifiable prediction, of sorts, but it’s a funny way to talk about it.
Cultivation story, but instead of cultivation, it’s a post AGI story in a world that’s mostly a utopia. But, there are AGI overlords, which are basically benevolent.
There’s a very stubborn young man, born in the classical sense (though without any problems like ageing disease, serious injuries, sickness, etc that people used to have—and without his mother having any of the screaming pain that childbirth used to have, or risk of life), who hates the state of power imbalance.
He doesnt want the Gods to just give him power (intelligence) - he wants to find the intelligence algorithms himself, with his peers, find the True Algorithm of Intelligence and Surpass the Gods. Even while the Gods are constant observers. He wants to do what the confused people around him think to be impossible.
His neighbours dont understand why. His cousin, who lives in the techno-hive doesn’t understand why—though he thinks that he does, from a lot of data and background on similar figures before and a large understanding of brains and intelligence. The boy’s cousin’s understanding is close, but despite coming close to a minima, he arrives at the wrong one, that just seems to explain what he’s understood from his observations.
Some of the confused people around him think that surely anything he can find, the Gods would have found ages ago—and even if he finds something new, surely they’ll learn it from observing him and just do it much much faster—he could just ask them to uplift him and they’d do it, this is a bit of a waste of time (even though everyone lives as long as they want)
Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya ) These are the things that need to be judged: 1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem) 2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?
Both of these things need: - a strong deep learning & ml background (ideally, muliple influential papers where they’re one of the main authors/co-authors, or doing ai research at a significant lab, or they have, in the last 4 years) - a good understanding of what the real alignment problem actually means—can judge this by looking at their papers, activity on lesswrong, alignmentforum, blog, etc - a good understanding of evals/benchmarks (1 great or two pretty good papers/repos/works on this, ideally for alignment)
The mind uploading stuff seems to be a way to justify being ok with dying, imo, and digging ones head into the sand, pretending that if something talks a bit like you, it is you.
If a friend can very accurately do an impression of me and continues to do so for a week, while wearing makeup to look like me, I have not ‘uploaded’ myself into them. And I still wouldn’t want to die, just because there’s someone who is doing an extremely good impression of myself.
Your future biological brain is also doing some sort of impression of a continuation of the present you. It’s not going to be doing an optimal job of it, for any nontrivial notion of what that should mean.
Status quo is one difference, but I don’t see any other prior principles that point to the future biological brain being a (morally) better way of running a human mind forward than using other kinds of implementations of the mind’s algorithm. If we apply a variant of the reversal test to this, a civilization of functionally human uploads should have a reason to become biological, but I don’t think there is a currently known clear reason to prefer that change.
A tree doesn’t simulate a meaningful algorithm, so the analogy would be chopping it down being approximately just as good.
When talking about running algorithms, I’m not making claims about identity or preserving-the-original in some other sense, as I don’t see how these things are morally important, necessarily (I can’t rule out that they might be, on reflection, but currently I don’t see it). What I’m saying is that a biological brain doesn’t have an advantage at the task of running the algorithms of a human mind well, for any sensible notion of running them well. We currently entrust this task to the biological brain, because there is no other choice, and because it’s always been like this. But I don’t see a moral argument there.
prob not gonna be relatable for most folk, but i’m so fucking burnt out on how stupid it is to get funding in ai safety. the average ‘ai safety funder’ does more to accelerate funding for capabilities than safety, in huge part because what they look for is Credentials and In-Group Status, rather than actual merit. And the worst fucking thing is how much they lie to themselves and pretend that the 3 things they funded that weren’t completely in group, mean that they actually aren’t biased in that way.
At least some VCs are more honest that they want to be leeches and make money off of you.
Who or what is the “average AI safety funder”? Is it a private individual, a small specialized organization, a larger organization supporting many causes, an AI think tank for which safety is part of a capabilities program...?
I asked because I’m pretty sure that I’m being badly wasted (i.e. I could be making much more substantial contributions to AI safety), but I very rarely apply for support, so I thought I’d ask for information about the funding landscape from someone who has been exploring it.
And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.
I asked because I’m pretty sure that I’m being badly wasted (i.e. I could be making much more substantial contributions to AI safety),
I think this is the case for most in AI Safety rn
And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.
Thanks! Doing a bunch of stuff atm, to make it easier to use and a larger userbase.
ok, options. - Review of 108 ai alignment plans - write-up of Beyond Distribution—planned benchmark for alignment evals beyond a models distribution, send to the quant who just joined the team who wants to make it - get familiar with the TPUs I just got access to - run hhh and it’s variants, testing the idea behind Beyond Distribution, maybe make a guide on itr— continue improving site design
- fill out the form i said i was going to fill out and send today - make progress on cross coders—would prob need to get familiar with those tpus - writeup of ai-plans, the goal, the team, what we’re doing, what we’ve done, etc - writeup of the karma/voting system - the video on how to do backprop by hand - tutorial on how to train an sae
think Beyond Distribution writeup. he’s waiting and i feel bad.
btw, thoughts on this for ‘the alignment problem’? ”A robust, generalizable, scalable, method to make an AI model which will do set [A] of things as much as it can and not do set [B] of things as much as it can, where you can freely change [A] and [B]”
Freely changing an AGIs goals is corrigibility, which is a huge advantage if you can get it. See Max Harms’ corrigibility sequence and my “instruction-following AGI is easier....”
The question is how a reliably get such a thing. Goalcrafting is one part of the problem, and I agree that those are good goals; the other and larger part is technical alignment, getting those desired goals to really work that way in the particular first AGI we get.
I’d say you’re addressing the question of goalcrafting or selecting alignment targets.
I think you’ve got the right answer for technical alignment goals; but the question remains of what human would control that AGI. See my “if we solve alignment, do we all die anyway” for the problems with that scenario.
Spoiler alert; we do all die anyway if really selfish people get control of AGIs. And selfish people tend to work harder at getting power.
But I do think your goal defintion is a good alignment target for the technical work. I don’t think there’s a better one. I do prefer instruction following or corriginlilty by the definitions in the posts I linked above because they’re less rigid, but they’re both very similar to your definition.
I pretty much agree. I prefer rigid definitions because they’re less ambiguous to test and more robust to deception. And this field has a lot of deception.
We’ve run two 150+ Alignment Evaluations Hackathons, that were 1 week long. Multiple teams continuing their work and submitting to NeurIPS. Had multiple quants, Wall Street ML researcher, an AMD engineer, PhDs, etc taking part. Hosting a Research Fellowship soon, on the Hard Part of AI Alignment. Actually directly trying to get values into the model in a way that will robustly scale to an AGI that does things that we want and not things we don’t want. I’ve read 120+ Alignment Plans—the vast majority don’t even try to solve the hard part of alignment, let alone doing a decent job of it. I’m confident I can get 200+ signups, with 50+ talented people working on solving the hard part of AI Alignment. Would like help with funding for this. Funding would go towards wages for myself and other organizers.
Organizers include Ana, currently working at ML4Good, whose thesis was in preference optimization—Ana is a great communicator and helped run the second hackthon. And multiple other talented people.
What info other than this is needed for a funding application? This and a call should really be enough info, imo.
in general, when it comes to things which are the ‘hard part of alignment’, is the crux ``` a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things ``` ? the key part being flawless—and that seeming to need a mathematical proof?
### Limitations of HHH and other Static Dataset benchmarks
A Static Dataset is a dataset which will not grow or change—it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how ‘aligned’ the AI is.
### Purpose of AI Alignment Benchmarks
When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate ‘aligned’ model that we’re seeking—a model whose preferences are compatible with ours, in a way that will empower humanity, not harm or disempower it.
### Difficulties of Designing AI Alignment Benchmarks What preferences those are, could be a significant part of the alignment problem. This means that we will need to frequently make sure we know what preferences we’re trying to measure for and re-determine if these are the correct ones to be aiming for.
### Key Properties of Aligned Models
These preferences must be both robustly and faithfully held by the model: Robustness: - They will be preserved over unlimited iterations of the model, without deterioration or deprioritization. - They will be robust to external attacks, manipulations, damage, etc of the model. Faithfulness: - The model ‘believes in’, ‘values’ or ‘holds to be true and important’ the preferences that we care about . - It doesn’t just store the preferences as information of equal priority to any other piece of information, e.g. how many cats are in Paris—but it holds them as its own, actual preferences.
I’d like some feedback on my theory of impact for my currently chosen research path
**End goal**: Reduce x-risk from AI and risk of human disempowerment. for x-risk: - solving AI alignment—very important, - knowing exactly how well we’re doing in alignment, exactly how close we are to solving it, how much is left, etc seems important. - how well different methods work, - which companies are making progress in this, which aren’t, which are acting like they’re making progress vs actually making progress, etc —put all on a graph, see who’s actually making the line go up
- Also, a way that others can use to measure how good their alignment method/idea is, easily so there’s actually a target and a progress bar for alignment—seems like it’d make alignment research a lot easier and improve the funding space—and the space as a whole. Improving the quality and quantity of research.
- Currently, it’s mostly a mixture of vibe checks, occasional benchmarks that test a few models, jailbreaks, etc - all almost exclusively on the end models as a whole—which have many, many differences that could be contributing to the differences in the different ’alignment measurements’ by having a method that keeps things controlled as much as possible and just purely measures the different post training methods, this seems like a much better way to know how we’re doing in alignment and how to prioritize research, funding, governence, etc
On Goodharting the Line—will also make it modular, so that people can add their own benchmarks, and highlight people who redteam different alignment benchmarks.
What is the proposed research path and its theory of impact? It’s not clear from reading your note / generally seems too abstract to really offer any feedback
I think this is a really good opportunity to work on a topic you might not normally work on, with people you might not normally work with, and have a big impact:
https://lu.ma/sjd7r89v
I’m running the event because I think this is something really valuable and underdone.
The Sequences highly praise Jaynes and recommend reading his work directly.
The Sequences aren’t trying to be a replacement, they’re trying to be a pop sci intro to the style of thinking. An easier on-ramp.
If Jaynes already seems exciting and comprehensible to you, read that instead of the Sequences on probability.
So, apparently, I’m stupid. I could have been making money this whole time, but I was scared to ask for it
i’ve been giving a bunch of people and businesses advice on how to do their research and stuff. one of them messaged me, i was feeling tired and had so many other things to do. said my time is busy.
then thought fuck it, said if they’re ok with a $15 an hour consulting fee, we can have a call. baffled, they said yes.
then realized, oh wait, i have multiple years of experience now leading dev teams, ai research teams, organizing research hackathons and getting frontier research done.
wtf
Yes, you can ask for a lot more than that :)
Yeah, a friend told me this was low—I’m just scared of asking for money rn I guess.
I do see people who seem very incompetent getting paid as consultants, so I guess I can charge for more. I’ll see how much my time gets eaten by this and how much money I need. I want to buy some gpus, hopefully this can help.
I’m not trying to be derisive; in fact, I relate to you greatly. But it’s by being on the outside that I’m able to levy a few more direct criticisms:
Were you not paid for the other work that you did, leading dev teams and getting frontier research done? Those things should be a baseline on the worth of your time.
If that, have you ever tried to maximize the amount of money you can get the) other people to acknowledge your time as worth (ie, get a high salary offer)?
Separately, do you know the going rate for consultants with approximately your expertise? Or any other reference class you cna make up. Consulting can cost an incredible amount of money, and that price can be “fair” in a pretty simple sense if it averts the need to do 10s of hours of labor at high wages. It may be one of the highest leverage activities per unit time that exists as a conventional economic activity that a person can simply do.
Aside from market rates or whatever, I suggest you just try asking for unreasonable things, or more money than you feel you’re worth (think of it as an experiment, and maybe observe what happens in your mind when you flinch from this).
Do you have any emotional hangup about the prospect of trading money for labor generally, or money for anything?
Separately, do you have a hard time asserting your worth to others (or maybe just strangers) on some baseline level?
This was running AI Plans, my startup, so makes sense that I wasn’t getting paid, since the same hesitancy for asking for money leads to hesitancy to do that exaggeration thing many AI Safety/EA people seem to do when making funding applications. Also, I don’t like to make the funding applications, or long applications in general.
I think every time I’ve asked for money, I’ve tried to ask for the lowest amount I can.
I don’t know—I have a doc of stuff I’ve done that I paste into LLMs when I need to make a funding applications and stuff—just pasted it into Gemini 2.5 Pro and asked what would be a reasonable hourly fee and it said $200 to $400 an hour.
I’ll give it a go—I’ve currently put the asking price on my call link for $50 an hour, feel nervous about actually asking for that though. I need to make a funding application for AI Plans—I can ask for money on behalf of others on the team, but asking for money to be donated so I can get a high salary feels scary. Happy to ask for a high salary for others on the team though, since I want them to get paid what they need.
Yeah, I do. Generally, I’m used to doing a lot of free work for family and getting admonished when I ask for money. And when I did get promised money, it was either wayyy below market price or wayy late or didn’t get paid at all. General experience with family was my work not being valued even when I put in extra effort. I’m aware that’s wrong and has taught me wrong lessons, but not fully learnt the true ones yet.
I do think that $200-$400 seem like reasonable consulting rates.
I think the situations with family are complicated, because sure, there are social/cultural reasons one might be expected to do those things for family. Usually people hold those cultural norms alongside a stronger distinction between the ingroup (family) and the outgroup (all other people by default), though, so letting your impressions from that culture teach you things about how to behave in a culture with a weaker distinction might be maladaptive.
(I actually was suggesting you try asking for objectively completely unreasonable things just to look at the flinch. For example, you could ask a stranger for $100 for no reason. They would say no, but no harm would be done.)
One frame that might be useful to you is that in a way, it is imperative to at least sufficiently assert your value to others (if not overassert it the socially expected amount). An overly modest estimate is still a miscalibrated one, and people will make suboptimal decisions as a result. (Putting aside the behavior and surpluses given to other people, you are also a player in this game, and your being underallocated resources is globally suboptimal.)
Ah, I can totally relate to this. Whenever I think about asking for money, the Impostor Syndrome gets extra strong. Meanwhile, there are actual impostors out there collecting tons of money without any shame. (Though they may have better social skills, which is probably the category of skill that ultimately gets paid best.)
Another important lesson I got once, which might be useful for you at some moment: “If you double your prices, and lose half of your customers as a result, you will still get the same amount of money, but only work half as much.”
Also, speaking from my personal experience, the relation between how much / how difficult work someone wants you to do, and how much they are willing to pay you, seems completely random. One might naively expect that a job that pays more will be more difficult, but often it is the other way round.
Update—consulting went well. He said he was happy with it and got a lot of useful stuff. I was upfront with the fact that I just made up the $15 an hour and might change it, asked him what he’d be happy with, he said it’s up to me, but didn’t seem bothered at all at the price potentially changing.
I was upfront about the stuff I didn’t know and was kinda surprised at how much I was able to contribute, even knowing that I underestimate my technical knowledge because I barely know how to code.
if someone who’s v good at math wants to do some agent foundations stuff to directly tackle the hard part of alignement, what should they do?
If they’re talented, look for a way to search over search processes without incurring the unbounded loss that would result by default.
If they’re educated, skim the existing MIRI work and see if any results can be stolen from their own field.
I currently think we’re mostly interested in properties that apply at all timesteps, or at least “quickly”, as well as in the limit; rather than only in the limit. I also think it may be easier to get a limit at all by first showing quickness, in this case, but not at all sure of that.
The actual hard parts? Math probably doesn’t help much directly, unfortunately. Mathematical thinking is good. You’ll have to learn how to think in novel ways, so there’s not even a vector anyone can point you in, except for pointers with a whole lot of “dereference not included” like “figure out how to understand the fundamental forces involved in what actually determines what a mind ends up trying to do long term” (https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html).
Some of the problems: https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html A meta-philosophy discussion of what might work: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
If you are capable of meaningfully pushing capabilities forward and doing literally anything else, that’s already pretty helpful.
They should think about what to do.
Trying to put together a better explainer for the hard part of alignment, while not having a good math background https://docs.google.com/document/d/1ePSNT1XR2qOpq8POSADKXtqxguK9hSx_uACR8l0tDGE/edit?usp=sharing
Please give feedback!
it’s so unnecessarily hard to get funding in alignment.
they say ‘Don’t Bullshit’ but what that actually means is ‘Only do our specific kind of bullshit’.
and they don’t specify because they want to pretend that they don’t have their own bullshit
This seems generally applicable. Any significant money transaction includes expectations, both legible and il-, which some participants will classify as bullshit. Those holding the expectations may believe it to be legitimately useful, or semi-legitimately necessary due to lack of perfect alignment.
If you want to specify a bit, we can probably guess at why it’s being required.
What I liked about applying for VC funding was the specific questions.
“How is this going to make money?”
“What proof do you have this is going to make money”
and it being clear the bullshit that they wanted was numbers, testimonials from paying customers, unambiguous ways the product was actually better, etc. And then standard bs about progress, security, avoiding weird wibbly wobbly talk, ‘woke’, ‘safety’, etc.
With Alignment funders, they really obviously have language they’re looking for as well, or language that makes them more and less willing to put more effort into understanding the proposal. Actually, they have it more than the VCs. But they act as if they don’t.
Have you felt this from your own experience trying to get funding, or from others, or both? Also, I’m curious what you think is their specific kind of bullshit, and if there’s things you think are real but others thought to be bullshit.
Both. Not sure, its something like lesswrong/EA speak mixed with the VC speak.
If I knew the specific bs, I’d be better at making successful applications and less intensely frustrated.
i earnt more from working at a call center for about 3 months than i have in 2+ years of working in ai safety.
And i’ve worked much harder in this than I did at the call center
Way to go! :D. The important thing is that you’ve realized it. If you naturally already get those enquiries, you’re halfway there: people already know you and reach out to you without you having to promote your expertise. Best of luck!
:) the real money was the friends we made along the way.
I dropped out of a math MSc. at a top university in order to spend time learning about AI safety. I haven’t made a single dollar and now I’m working as a part time cashier, but that’s okay.
What use is money if you end up getting turned into paperclips?
PS: do you want to sign my open letter asking for more alignment funding?
For AI Safety funders/regranters—e.g. Open Phil, Manifund, etc:
It seems like a lot of the grants are swayed by ‘big names’ being on there. I suggest making anonymity compulsary if you want to more merit based funding, that explores wider possibilities and invests in more upcoming things.
Treat it like a Science rather than the Bragging Competition it currently is.
A Bias Pattern atm seems to be that the same people get funding, or recommended funding by the same people, leading to the number of innovators being very small, or growing much more slowly than if the process was anonymised.
Also, ask people seeking funding to make specific, unambiguous, easily falsiable predictions of positive outcomes from their work. And track and follow up on this!
It’s an interesting idea, but the track records of the grantees are important information, right? And if the track record includes, say, a previous paper that the funder has already read, then you can’t submit the paper with author names redacted.
Wouldn’t it be better for the funder to just say “if I’m going to fund Group X for Y months / years of work, I should see what X actually accomplished in the last Y months / years, and assume it will be vaguely similar”? And if Group X has no comparable past experience, then fine, but that equally means that you have no basis for believing their predictions right now.
Also, what if someone predicts that they’ll do A, but then realizes it would be better if they did B? Two possibilities are: (1) You the funder trust their judgment. Then you shouldn’t be putting even minor mental barriers in the way of their pivoting. Pivoting is hard and very good and important! (2) You the funder don’t particular trust the recipient’s judgment, you were only funding it because you wanted that specific deliverable. But then the normal procedure is that the funder and recipient work together to determine the deliverables that the funder wants and that the recipient is able to provide. Like, if I’m funding someone to build a database of AI safety papers, then I wouldn’t ask them to “make falsifiable predictions about the outcomes from their work”, instead I would negotiate a contract with them that says they’re gonna build the database. Right? I mean, I guess you could call that a falsifiable prediction, of sorts, but it’s a funny way to talk about it.
AIgainst the Gods
Cultivation story, but instead of cultivation, it’s a post AGI story in a world that’s mostly a utopia. But, there are AGI overlords, which are basically benevolent.
There’s a very stubborn young man, born in the classical sense (though without any problems like ageing disease, serious injuries, sickness, etc that people used to have—and without his mother having any of the screaming pain that childbirth used to have, or risk of life), who hates the state of power imbalance.
He doesnt want the Gods to just give him power (intelligence) - he wants to find the intelligence algorithms himself, with his peers, find the True Algorithm of Intelligence and Surpass the Gods. Even while the Gods are constant observers. He wants to do what the confused people around him think to be impossible.
His neighbours dont understand why. His cousin, who lives in the techno-hive doesn’t understand why—though he thinks that he does, from a lot of data and background on similar figures before and a large understanding of brains and intelligence. The boy’s cousin’s understanding is close, but despite coming close to a minima, he arrives at the wrong one, that just seems to explain what he’s understood from his observations.
to be clear, instead of cultivating Qi, it’s RSI
and trying to learn to do it faster than the Gods are
gods being the AGIs
Some of the confused people around him think that surely anything he can find, the Gods would have found ages ago—and even if he finds something new, surely they’ll learn it from observing him and just do it much much faster—he could just ask them to uplift him and they’d do it, this is a bit of a waste of time (even though everyone lives as long as they want)
this might basically be me, but I’m not sure how exactly to change for the better. theorizing seems to take time and money which i don’t have.
Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged:
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem)
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?
Both of these things need:
- a strong deep learning & ml background (ideally, muliple influential papers where they’re one of the main authors/co-authors, or doing ai research at a significant lab, or they have, in the last 4 years)
- a good understanding of what the real alignment problem actually means—can judge this by looking at their papers, activity on lesswrong, alignmentforum, blog, etc
- a good understanding of evals/benchmarks (1 great or two pretty good papers/repos/works on this, ideally for alignment)
Do these seem loose? Strict? Off base?
The mind uploading stuff seems to be a way to justify being ok with dying, imo, and digging ones head into the sand, pretending that if something talks a bit like you, it is you.
If a friend can very accurately do an impression of me and continues to do so for a week, while wearing makeup to look like me, I have not ‘uploaded’ myself into them. And I still wouldn’t want to die, just because there’s someone who is doing an extremely good impression of myself.
Your future biological brain is also doing some sort of impression of a continuation of the present you. It’s not going to be doing an optimal job of it, for any nontrivial notion of what that should mean.
My future biological brain actually is a continuation of my current biological brain, in a way that an upload isn’t.
You seem to be saying:-
Identity does persist over time.
There is no basis for identity other than resemblance.
An upload has a similar level of resemblance to a future brain.just, so it’s good enough.
It neither 1 not 2 is a fact.
That’s like saying a future version of a tree is doing an impression of a continuation of the previous tree.
I don’t understand how the difference isn’t clear here.
Status quo is one difference, but I don’t see any other prior principles that point to the future biological brain being a (morally) better way of running a human mind forward than using other kinds of implementations of the mind’s algorithm. If we apply a variant of the reversal test to this, a civilization of functionally human uploads should have a reason to become biological, but I don’t think there is a currently known clear reason to prefer that change.
The objection is about what, if anything, counts as identity as a matter of fact.
If I take a tree, and I create a computer simulation of that tree, the simulation will not be a way of running the original tree forward at all.
A tree doesn’t simulate a meaningful algorithm, so the analogy would be chopping it down being approximately just as good.
When talking about running algorithms, I’m not making claims about identity or preserving-the-original in some other sense, as I don’t see how these things are morally important, necessarily (I can’t rule out that they might be, on reflection, but currently I don’t see it). What I’m saying is that a biological brain doesn’t have an advantage at the task of running the algorithms of a human mind well, for any sensible notion of running them well. We currently entrust this task to the biological brain, because there is no other choice, and because it’s always been like this. But I don’t see a moral argument there.
prob not gonna be relatable for most folk, but i’m so fucking burnt out on how stupid it is to get funding in ai safety. the average ‘ai safety funder’ does more to accelerate funding for capabilities than safety, in huge part because what they look for is Credentials and In-Group Status, rather than actual merit.
And the worst fucking thing is how much they lie to themselves and pretend that the 3 things they funded that weren’t completely in group, mean that they actually aren’t biased in that way.
At least some VCs are more honest that they want to be leeches and make money off of you.
Who or what is the “average AI safety funder”? Is it a private individual, a small specialized organization, a larger organization supporting many causes, an AI think tank for which safety is part of a capabilities program...?
all of the above, then averaged :p
I asked because I’m pretty sure that I’m being badly wasted (i.e. I could be making much more substantial contributions to AI safety), but I very rarely apply for support, so I thought I’d ask for information about the funding landscape from someone who has been exploring it.
And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.
I think this is the case for most in AI Safety rn
Thanks! Doing a bunch of stuff atm, to make it easier to use and a larger userbase.
the average ai safety funder does more to accelerate capabilities than they do safety, in part due to credentialism and looking for in group status.
ok, options.
- Review of 108 ai alignment plans
- write-up of Beyond Distribution—planned benchmark for alignment evals beyond a models distribution, send to the quant who just joined the team who wants to make it
- get familiar with the TPUs I just got access to
- run hhh and it’s variants, testing the idea behind Beyond Distribution, maybe make a guide on itr—
continue improving site design
- fill out the form i said i was going to fill out and send today
- make progress on cross coders—would prob need to get familiar with those tpus
- writeup of ai-plans, the goal, the team, what we’re doing, what we’ve done, etc
- writeup of the karma/voting system
- the video on how to do backprop by hand
- tutorial on how to train an sae
think Beyond Distribution writeup. he’s waiting and i feel bad.
btw, thoughts on this for ‘the alignment problem’?
”A robust, generalizable, scalable, method to make an AI model which will do set [A] of things as much as it can and not do set [B] of things as much as it can, where you can freely change [A] and [B]”
Freely changing an AGIs goals is corrigibility, which is a huge advantage if you can get it. See Max Harms’ corrigibility sequence and my “instruction-following AGI is easier....”
The question is how a reliably get such a thing. Goalcrafting is one part of the problem, and I agree that those are good goals; the other and larger part is technical alignment, getting those desired goals to really work that way in the particular first AGI we get.
Yup, those are hard. Was just thinking of a definition for the alignment problem, since I’ve not really seen any good ones.
I’d say you’re addressing the question of goalcrafting or selecting alignment targets.
I think you’ve got the right answer for technical alignment goals; but the question remains of what human would control that AGI. See my “if we solve alignment, do we all die anyway” for the problems with that scenario.
Spoiler alert; we do all die anyway if really selfish people get control of AGIs. And selfish people tend to work harder at getting power.
But I do think your goal defintion is a good alignment target for the technical work. I don’t think there’s a better one. I do prefer instruction following or corriginlilty by the definitions in the posts I linked above because they’re less rigid, but they’re both very similar to your definition.
I pretty much agree. I prefer rigid definitions because they’re less ambiguous to test and more robust to deception. And this field has a lot of deception.
This is a great set of replies to an AI post, on a quality level I didn’t think I’d see on bluesky https://bsky.app/profile/steveklabnik.com/post/3lqaqe6uc3c2u
We’ve run two 150+ Alignment Evaluations Hackathons, that were 1 week long. Multiple teams continuing their work and submitting to NeurIPS. Had multiple quants, Wall Street ML researcher, an AMD engineer, PhDs, etc taking part.
Hosting a Research Fellowship soon, on the Hard Part of AI Alignment. Actually directly trying to get values into the model in a way that will robustly scale to an AGI that does things that we want and not things we don’t want.
I’ve read 120+ Alignment Plans—the vast majority don’t even try to solve the hard part of alignment, let alone doing a decent job of it.
I’m confident I can get 200+ signups, with 50+ talented people working on solving the hard part of AI Alignment.
Would like help with funding for this.
Funding would go towards wages for myself and other organizers.
Organizers include Ana, currently working at ML4Good, whose thesis was in preference optimization—Ana is a great communicator and helped run the second hackthon. And multiple other talented people.
What info other than this is needed for a funding application? This and a call should really be enough info, imo.
in general, when it comes to things which are the ‘hard part of alignment’, is the crux
```
a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things
```
?
the key part being flawless—and that seeming to need a mathematical proof?
Thoughts on this?
### Limitations of HHH and other Static Dataset benchmarks
A Static Dataset is a dataset which will not grow or change—it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how ‘aligned’ the AI is.
### Purpose of AI Alignment Benchmarks
When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate ‘aligned’ model that we’re seeking—a model whose preferences are compatible with ours, in a way that will empower humanity, not harm or disempower it.
### Difficulties of Designing AI Alignment Benchmarks
What preferences those are, could be a significant part of the alignment problem. This means that we will need to frequently make sure we know what preferences we’re trying to measure for and re-determine if these are the correct ones to be aiming for.
### Key Properties of Aligned Models
These preferences must be both robustly and faithfully held by the model:
Robustness:
- They will be preserved over unlimited iterations of the model, without deterioration or deprioritization.
- They will be robust to external attacks, manipulations, damage, etc of the model.
Faithfulness:
- The model ‘believes in’, ‘values’ or ‘holds to be true and important’ the preferences that we care about .
- It doesn’t just store the preferences as information of equal priority to any other piece of information, e.g. how many cats are in Paris—but it holds them as its own, actual preferences.
Comment on the Google Doc here: https://docs.google.com/document/d/1PHUqFN9E62_mF2J5KjcfBK7-GwKT97iu2Cuc7B4Or2w/edit?usp=sharing
This is for the AI Alignment Evals Hackathon: https://lu.ma/xjkxqcya by AI-Plans
I’m looking for feedback on the hackathon page
mind telling me what you think?
https://docs.google.com/document/d/1Wf9vju3TIEaqQwXzmPY—R0z41SMcRjAFyn9iq9r-ag/edit?usp=sharing
https://kkumar97.blogspot.com/2025/01/pain-of-writing.html
I’d like some feedback on my theory of impact for my currently chosen research path
**End goal**: Reduce x-risk from AI and risk of human disempowerment.
for x-risk:
- solving AI alignment—very important,
- knowing exactly how well we’re doing in alignment, exactly how close we are to solving it, how much is left, etc seems important.
- how well different methods work,
- which companies are making progress in this, which aren’t, which are acting like they’re making progress vs actually making progress, etc
—put all on a graph, see who’s actually making the line go up
- Also, a way that others can use to measure how good their alignment method/idea is, easily
so there’s actually a target and a progress bar for alignment—seems like it’d make alignment research a lot easier and improve the funding space—and the space as a whole. Improving the quality and quantity of research.
- Currently, it’s mostly a mixture of vibe checks, occasional benchmarks that test a few models, jailbreaks, etc
- all almost exclusively on the end models as a whole—which have many, many differences that could be contributing to the differences in the different ’alignment measurements’
by having a method that keeps things controlled as much as possible and just purely measures the different post training methods, this seems like a much better way to know how we’re doing in alignment
and how to prioritize research, funding, governence, etc
On Goodharting the Line—will also make it modular, so that people can add their own benchmarks, and highlight people who redteam different alignment benchmarks.
What is the proposed research path and its theory of impact? It’s not clear from reading your note / generally seems too abstract to really offer any feedback
I think this is a really good opportunity to work on a topic you might not normally work on, with people you might not normally work with, and have a big impact: https://lu.ma/sjd7r89v
I’m running the event because I think this is something really valuable and underdone.
give better names to actual formal math things, jesus christ.
I’m finally reading The Sequences and it screams midwittery to me, I’m sorry.
Compare this:
to Jaynes:
Jaynes is better organized, more respectful to the reader, more respectful to the work he’s building on and more useful
The Sequences highly praise Jaynes and recommend reading his work directly.
The Sequences aren’t trying to be a replacement, they’re trying to be a pop sci intro to the style of thinking. An easier on-ramp. If Jaynes already seems exciting and comprehensible to you, read that instead of the Sequences on probability.
Fair enough. Personally, so far, I’ve found Jaynes more comprehensible than The Sequences.
I think most people with a natural inclination towards math probably would feel likewise.