Former community director EA Netherlands. Now disabled by long covid, ME/CFS. Worried about AGI & US democracy
Siebe
This is very interesting, and I had a recent thought that’s very similar:
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?
The way I imagine this to work is basically that an AI agent would develop really strong intuitions that “that’s just what ASIs do”. It might prevent it from properly modelling other agents that aren’t trained on this, but it’s not obvious to me that that’s going to happen or that it’s such a decisively bad thing to outweigh the positives
I imagine that the ratio of descriptions of desirable vs. descriptions of undesirable behavior would matter, and perhaps an ideal approach would both (massively) increase the amount of descriptions of desirable behavior as well as filter out the descriptions of unwanted behavior?
Looks like Evan Hubinger has done some very similar research just recently: https://www.lesswrong.com/posts/qXYLvjGL9QvD3aFSW/training-on-documents-about-reward-hacking-induces-reward
I think it might make sense to do it as a research project first? Though you would need to be able to train a model from scratch
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?
The way I imagine this to work is basically that an AI agent would develop really strong intuitions that “that’s just what ASIs do”. It might prevent it from properly modelling other agents that aren’t trained on this, but it’s not obvious to me that that’s going to happen or that it’s such a decisively bad thing to outweigh the positives
Siebe’s Shortform
I think you should publicly commit to:
full transparency about any funding from for profit organisations, including nonprofit organizations affiliated with for profit
no access to the benchmarks to any company
no NDAs around this stuff
If you currently have any of these with the computer use benchmark in development, you should seriously try to get out of those contractual obligations if there are any.
Ideally, you commit to these in a legally binding way, which would make it non-negotiable in any negotiation, and make you more credible to outsiders.
I don’t think that all media produced by AI risk concerned people needs to mention that AI risk is a big deal—that just seems annoying and preachy. I see Epoch’s impact story as informing people of where AI is likely to go and what’s likely to happen, and this works fine even if they don’t explicitly discuss AI risk
I don’t think that every podcast episode should mention AI risk, but it would be pretty weird in my eyes to never mention it. Listeners would understandably infer that “these well-informed people apparently don’t really worry much, maybe I shouldn’t worry much either”. I think rationalists easily underestimate how much other people’s beliefs depend on what the people around them & their authority figures believe.
I think they have a strong platform to discuss risks occasionally. It also simply feels part of “where AI is likely to go and what’s likely to happen”.
This is a really good comment. A few thoughts:
-
Deployment had a couple of benefits: real-world use gives a lot of feedback on strengths, weaknesses, jailbreaks. It also generates media/hype that’s good for attracting further investors (assuming OpenAI will want more investment in the future?)
-
The approach you describe is not only useful for solving more difficult questions. It’s probably also better at doing more complex tasks, which in my opinion is a trickier issue to solve. According to Flo Crivello:
We’re starting to switch all our agentic steps that used to cause issues to o1 and observing our agents becoming basically flawless overnight https://x.com/Altimor/status/1875277220136284207
So this approach can generate data on complex sequential tasks and lead to better performance on increasingly longer tasks.
-
I didn’t read the post, but just fyi that an automated AI R&D system already exists, and it’s open-source: https://github.com/ShengranHu/ADAS/
I wrote the following comment about my safety concerns and notified Haize , Apollo, METR, and GovAI but only Haize replied https://github.com/ShengranHu/ADAS/issues/16#issuecomment-2354703344
this Washington Post article supports the ‘Scheming Sam’ Hypothesis: anonymous reports mostly from his time at Y Combinator
Meta’s actions seem unrelated?
Just coming to this now, after Altman’s firing (which seems unrelated?)
At age 5, she began waking up in the middle of the night, needing to take a bath to calm her anxiety. By 6, she thought about suicide, though she didn’t know the word.”
To me, this adds a lot of validity to the whole story and I haven’t seen these points made:
-
Becoming suicidal at such an early age isn’t normal, and very likely has a strong environmental cause (like being abused, or losing a loved one)
-
The bathing to relieve anxiety is typical sexual trauma behavior (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577979/)
Of course, we don’t know for sure that she told the truth that this started at that age, but we can definitely not dismiss it.
On the recovered memories: I listen to a lot of podcasts where people talk about their own trauma and healing (with respected therapists). It’s very common in those that people start realizing in adulthood that something was wrong in their childhood, and increasingly figure out why they’ve always felt so ‘off’.
On the shadowbanning & hacking: This part feels more tenuous to me, especially the shadowbanning. But I don’t think this disqualifies the rest of the story. She’s had a really hard life and surely would have trust issues, and her brother is a powerful man.
-
Except that herd immunity isn’t really a (permanent) thing; only temporary
I had not seen it, because I don’t read this form these days. I can’t reply in too much detail but here are some points:
I think it’s a decent attempt, but a little biased towards the “statistically clever” estimate. I do agree that many studies are pretty done. However, I’ve seen good ones that include controls, confirm infection via PCR, are large, and have pre pandemic health data. This was in a Dutch presentation of a data set though, and not clearly reported for some reason. (This is the project, but their data is not publicly available: https://www.lifelines.nl/researcher/explore-lifelines/covid-data).
It is really difficult to get a proper control group, because both PCR tests and antibody tests have significant false negative rates.
Furthermore, the Zvi asserts that self reports lead to an overestimate because they are inaccurate. I agree that self reports are inaccurate, but there will definitely be people with long COVID that think it’s something else (e.g. burnout), so this can really go both ways.
In addition, we have biological data with a control group and prepandemic data: https://www.nature.com/articles/s41586-022-04569-5 There were many significant differences in the brain scans of these groups. I can’t do the digging to translate those data into frequency estimates though.
I also think that for outsiders, long COVID symptoms sound vague: fatigue, brain fog, etc. In fact, there’s a lot of clear symptoms, such as orthostatic intolerance, post exertion al symptom exacerbation, heart palpitations, muscle tremors, oxygen saturation drops.
Lastly, I think we should be careful to assess future risk based on past risk: variants change, vaccine protection changes, and as I write above, there’s some initial data suggesting reinfections are worse due to a weakened immune system.
Yes, vaccine injury is actually rather common—I’ve seen a lot of very credible case reports reporting either initiation of symptoms since vaccine (after having been infected), or more often worsening of symptoms. Top long COVID researchers also believe these.
I don’t think the data for keto is that strong. Plenty of people with long COVID are trying it with not amazing results.
The 15% is an upper estimate of people estimating ‘some loss’ of health, so not everyone would be severely disabled.
Unfortunately, the data isn’t great, and I can’t produce a robust estimate right now
Uhm, no? I’m quoting you on the middle category, which overlaps with the long category.
Also, there’s no need to speculate, because there have been studies linking severity and viral load to increased risk of long COVID. https://www.cell.com/cell/fulltext/S0092-8674(22)00072-1
You have far more faith in the rationality of government decision making during novel crises than I do.
Healthcare workers can barely or often not at all with with long covid.
Lowering infection rates, remaining able to work, and not needing to make high demands on the healthcare system seems much better for the economy. This is not an infohazard at all.
Awesome in depth response! Yes, I was hoping this post to serve as an initial alarm bell to look further into, rather than being definitive advice based on a comprehensive literature review.
I can’t respond to everything, at least not at once, but here’s some:
categories of ‘at least 12 weeks’ and ‘at least 1 year’ do overlap, right?
I think the different waves may have had different underreporting factors, with least underreporting during Delta, so we can’t take those rates at face value, and I prefer using estimated cases whenever possible
What about whistle-blowing and anonymous leaking? Seems like it would go well together with concrete evidence of risk.