I agree that the ultimate goal is to understand the weights. Seems pretty unclear whether trying to understand the activations is a useful stepping stone towards that. And it’s hard to be sure how relevant theoretical toy example are to that question.
Joseph Miller
Ilya Sutskever had two armed bodyguards with him at NeurIPS.
Some people are asking for a source on this. I’m pretty sure I’ve heard it from multiple people who were there in person but I can’t find a written source. Can anyone confirm or deny?
Well, it seems quite important whether the DROS registration could possibly have been staged.
That would be difficult. To purchase a gun in California you have to provide photo ID[1], proof of address[2] and a thumbprint[3]. Also it looks like the payment must be trackable[4] and gun stores have to maintain video surveillance footage for up to year.[5]
My guess is that the police haven’t actually invested this as a potential homicide, but if they did, there should be very strong evidence that Balaji bought a gun. Potentially a very sophisticated actor could fake this evidence but it seems challenging (I can’t find any historical examples of this happening). It would probably be easier to corrupt the investigation. Or the perpetrators might just hope that there would be no investigation.
There is a 10-day waiting period to purchase guns in California[5], so Balaji would probably have started planning his suicide before his hiking trip (I doubt someone like him would own a gun for recreational purposes?).
Is the interview with the NYT going to be published?
I think it’s this piece that was published before his death.
Is any of the police behavior actually out of the ordinary?
Epistemic status: highly uncertain: my impressions from searching with LLMs for a few minutes.
It’s fairly common for victim’s families to contest official suicide rulings. In cases with lots of public attention police generally try to justify their conclusions. So we might expect the police to publicly state if there is footage of Balaji purchasing the gun shortly before his death. It could be that this will still happen with more time or public pressure.
land in space will be less valuable than land on earth until humans settle outside of earth (which I don’t believe will happen in the next few decades).
Why would it take so long? Is this assuming no ASI?
Wow that’s great, thanks. @L Rudolf L you should link this in this post.
As in, this is also what the police say?
Yes, edited to clarify. The police say there was no evidence of foul play. All parties agree he died in his bathroom of a gunshot wound.
Did the police find a gun in the apartment? Was it a gun Suchir had previously purchased himself according to records? Seems like relevant info.
The only source I can find on this is Webb, so take with a grain of salt. But yes, they found a gun in the apartment. According to Webb, the DROS registration information was on top of the gun case[1] in the apartment, so presumably there was a record of him purchasing the gun (Webb conjectures that this was staged). We don’t know what type of gun it was[2] and Webb claims it’s unusual for police not to release this info in a suicide case.
This is an attempt to compile all publicly available primary evidence relating to the recent death of Suchir Balaji, an OpenAI whistleblower.
This is a tragic loss and I feel very sorry for the parents. The rest of this piece will be unemotive as it is important to establish the nature of this death as objectively as possible.
I was prompted to look at this by a surprising conversation I had IRL suggesting credible evidence that it was not suicide. The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 26 2024. The police say it was a suicide with no evidence of foul play.
Most of the evidence we have comes from the parents and George Webb. Webb describes himself as an investigative journalist, but I would classify him as more of a conspiracy theorist, based on a quick scan of some of his older videos. I think many of the specific factual claims he has made about this case are true, though I generally doubt his interpretations.
Webb seems to have made contact with the parents early on and went with them when they first visited Balaji’s apartment. He has since published videos from the scene of the death, against the wishes of the parents[1] and as a result the parents have now unendorsed Webb.[2]
List of evidence:
He didn’t leave a suicide note.[3]
The cause of death was decided by the authorities in 14 (or 40, unclear) minutes.[4]
The parents arranged a private autopsy which “made their suspicions stronger”.[5]
The parents say “there are a lot of facts that are very disturbing for us and we cannot share at the moment but when we do a PR all of that will come out.”[6]
The parents say “his computer has been deleted, his desktop has been messed up”.[7]
It was his birthday and he bought a bike on the week of his death.[10]
He said he didn’t want to work and he was going to take a gap year, “leaving the AI industry and getting into machine learning and neuroscience” but also he was planning to start his own company and was reaching out to VCs for seed funding.[11]
He had just interviewed with the New York Times and he was supposed to do further interviews in the days after his death.[12]
According to the parents and Webb, there are signs of foul play at the scene of death:
There are several areas with blood, [Confirmed from pictures] suggesting to Webb and the parents he was trying to crawl out of the bathroom.[13][14]
Webb says the body had bleeding from the genitals.[15] I’m not aware of a better source for this claim, so right now I think it is probably false.
The trash can in the bathroom was knocked over.[13][16] [Confirmed from pictures].
A floss pick is on the floor.[13][17] [Confirmed from pictures]. Webb interprets this as being dropped at the time of death, suggesting that Balaji was caught by surprise.
The path of the bullet through the head missed the brain. I’m not sure what the primary source for this is, but I’m not sure why Webb would invent this, so I think it’s true. Webb takes this as evidence that it was shot during a struggle rather than at the considered pace of a suicide.[18]
The bullet did not go all the way through the head, suggesting a lower caliber, quiet gun.[19]
According to Webb and the parents, the drawers of the apartment were ransacked, the cupboards were thrown open.[8][20] From the pictures this looks false, although the apartment is very messy and his hiking backpacks are strewn around with much of their contents on the table (he had recently returned from a hiking trip).
The blood on the sink looks different, suggesting to Webb that it came from a different part of the body.[21] This is not obvious to me from the pictures but not implausible and the main pool of blood looks surprisingly dark.
There is a half-eaten meal at the desk in the apartment. [Confirmed from pictures].
There is a tuft of Balaji’s hair, soaked in blood, under the bathroom door. [Confirmed that’s what it looks like in the pictures], again suggesting to Webb a violent struggle.
According to the parents, he had a USB thumb drive which is now missing, containing important evidence for an upcoming court case about OpenAI’s use of copyrighted data.[8][22]
People that spoke to him around the time of his death report him to have been in high spirits and making plans for the near future.[23]
George Webb claims there were security cameras working all on floors except the floor which he lived on.[24] This appears to conflict with the parents’ claim that the police said no one came in or out (see below), but may be referring to different cameras, as the parents also mention that the murderer could have come through a different entrance to the main one.[25]
The parents say “OpenAI has deleted the copyright data that was evidence that was given to the discovery for the [New York Times] lawsuit. They deleted the data and now my son is also gone, so now they’re all set for winning the lawsuit… It’s also said that my son had the documentation to prove the copyright violation. His statement, his testimony would have turned the AI industry upside down...”[26]
Looking into the details of this, OpenAI did delete some data but this wasn’t a permanent deletion of any of the primary sources and I think it was probably an accident and not significant to the outcome of the case.
I don’t see any strong reason to believe Balaji had secret evidence that would have been critical to the outcome of the case.
Ilya Sutskever had two armed bodyguards with him at NeurIPS.
Evidence against:
One reason the authorities gave for declaring it a suicide was that CCTV footage showed that no one else came in or out of the apartment.[27]
In high school Balaji won a $100,000 prize for a computer science competition. His parents didn’t find out until they saw the news online, suggesting he may not have been very open with them.[28]
My interpretations:
If we interpret the apartment as simply messy (as it looks to me), rather than ransacked, then we can probably discount the knocked-over trash can, the floss pick on the floor and the half-eaten meal. We can also probably discard the hypothesis of someone trying to locate a USB drive with secret information, which raises more questions than it answers (why didn’t he reveal this information before? why didn’t he back up this crucial data anywhere?).
In my uninformed view, it doesn’t look like the pictures of the scene of death strongly suggest a struggle between murderer and victim, although I don’t know how to explain the tuft of hair.
The motivations of OpenAI or some other actor to murder a whistleblower are unlikely. The most plausible to me is that they want to send a warning to other potential whistleblowers, but this isn’t very compelling.
There’s no smoking gun and the parents (understandably) do not look like they are thinking very systematically to establish a case for foul-play. This is notable because their claim of foul-play is the main factor that privileged this hypothesis to credible people.
Balaji appeared from the outside to be a happy and highly successful person with important plans in the next few days. It is surprising that someone like that would commit suicide.
Overall my conclusion is that this was a suicide with roughly 96% confidence. This is a slight update downwards from 98% when I first heard about it and overall quite concerning.
I encourage people to trade on this related prediction market and report further evidence.
Useful sources:
- ^
I’m not linking to this evidence here, in the spirit of respecting the wishes of the parents, but this is an important source that informed my understanding of the situation.
- ^
- ^
Source: Poornima Ramarao (11:22)
- ^
Source: Poornima Ramarao (12:38)
- ^
Source: Poornima Ramarao (13:02)
- ^
Source: Poornima Ramarao (15:47)
- ^
Source: Poornima Ramarao (16:36)
- ^
Source: George Webb + Poornima Ramarao (1:45)
- ^
Source: George Webb (9:56)
- ^
Source: (23:27)
- ^
Source: (8:02)
- ^
Source: (26:00)
- ^
Source: George Webb + Poornima Ramarao (0:35)
- ^
Source: George Webb (6:53)
- ^
Source: George Webb (3:38)
- ^
Source: George Webb (5:44)
- ^
Source: George Webb (5:46)
- ^
Source: George Webb (0:05)
- ^
Source: George Webb (6:23)
- ^
Source: George Webb (9:12)
- ^
Source: Poornima Ramarao (1:18)
- ^
Source: George Webb (9:45)
- ^
Source: Poornima Ramarao (4:30)
- ^
Source: George Webb (9:30)
- ^
Source: Poornima Ramarao (2:40)
- ^
Source: Poornima Ramarao (4:14)
- ^
Source: Poornima Ramarao (12:42)
- ^
Source: Ramamurthy (17:37)
- ^
Source: George Webb (13:29)
- ^
Source: George Webb (5:43)
Has someone made an ebook that I can easily download onto my kindle?
I’m unclear if a good ebook should include all the pictures from the original version.
LLMs can pick up a much broader class of typos than spelling mistakes.
For example in this comment I wrote “Don’t push the frontier of regulations” when from context I clearly meant to say “Don’t push the frontier of capabilities” I think an LLM could have caught that.
LessWrong LLM feature idea: Typo checker
It’s becoming a habit for me to run anything I write through an LLM to check for mistakes before I send it off.
I think the hardest part of implementing this feature well would be to get it to only comment on things that are definitely mistakes / typos. I don’t want a general LLM writing feedback tool built-in to LessWrong.
The ideal version of Anthropic would
Make substantial progress on technical AI safety
Use its voice to make people take AI risk more seriously
Support AI safety regulation
Not substantially accelerate the AI arms race
In practice I think Anthropic has
Made a little progress on technical AI safety
Used its voice to make people take AI risk less seriously[1]
Obstructed AI safety regulation
Substantially accelerated the AI arms race
What I would do differently.
Do better alignment research, idk this is hard.
Communicate in a manner that is consistent with the apparent belief of Anthropic leadership that alignment may be hard and x-risk is >10% probable. Their communications strongly signal “this is a Serious Issue, like climate change, and we will talk lots about it and make gestures towards fixing the problem but none of us are actually worried about it, and you shouldn’t be either. When we have to make a hard trade-off between safety and the bottom line, we will follow the money every time.”
Lobby politicians to regulate AI. When a good regulation like SB-1047 is proposed, support it.
Don’t push the frontier of capabilities. Obviously this is basically saying that Anthropic should stop making money and therefore stop existing. The more nuanced version is that for Anthropic to justify its existence, each time it pushes the frontier of capabilities should be earned by substantial progress on the other three points.
- ^
My understanding is that a significant aim of your recent research is to test models’ alignment so that people will take AI risk more seriously when things start to heat up. This seems good but I expect the net effect of Anthropic is still to make people take alignment less seriously due to the public communications of the company.
- AI #97: 4 by 2 Jan 2025 14:10 UTC; 45 points) (
- 29 Dec 2024 22:22 UTC; 4 points) 's comment on Joseph Miller’s Shortform by (
The ARENA curriculum is very good.
It does seem pretty suspicious.
I’m like 98% confident this was not foul-play, partly because I doubt whatever evidence he had would be that important to the court case and obviously his death is going to draw far more attention to his view.
However, 98% is still quite worrying and I wish I could be >99% confident. I will be interested to see if there is further evidence. Given OpenAI’s very shady behavior with the secret non-disparagement agreements that came out a few months, it doesn’t seem completely impossible they might do this (but still very very unlikely imo).
The probability of a 26 year old dying of suicide in any given month (within the month of being named the key witness in the OpenAI copyright case, right before deposition) is roughly 1 in 100,000
This prior is a useful starting point, but you’ve definitely got to account for the stress of leaving OpenAI and going through a lawsuit.
(I downvoted this post for combative tone.)
- 7 Jan 2025 0:10 UTC; 145 points) 's comment on Joseph Miller’s Shortform by (
One of the striking parts is that it sounds like all the pretraining people are optimistic
What’s the source for this?
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
I started working on PhD applications about 12 days ago. I expect to have fairly polished applications for the first deadline on December 1, despite not working on this full time. So I think it’s quite possible to do applications for the December 15 deadlines. You would need to contact your referees (and potential supervisors for UK universities) in the next couple of days.
There are two types of people in this world.
There are people who treat the lock on a public bathroom as a tool for communicating occupancy and a safeguard against accidental attempts to enter when the room is unavailable. For these people the standard protocol is to discern the likely state of engagement of the inner room and then tentatively proceed inside if they detect no signs of human activity.
And there are people who view the lock on a public bathroom as a physical barricade with which to temporarily defend possessed territory. They start by giving the door a hearty push to test the tensile strength of the barrier. On meeting resistance they engage with full force, wringing the handle up and down and slamming into the door with their full body weight. Only once their attempts are thwarted do they reluctantly retreat to find another stall.
Tarbell Fellowship at PPF
I think you’ve massively underrated this. My impression is that Tarbell has had significant effect on the general AI discourse, by allowing a number of articles to be written in mainstream outlets.
karma should also transfer automatically
Is that rolling up two things into one, or is that just beta-coherence?