markov

Karma: 386

Understanding Benchmarks and motivating Evaluations

markov and Charbel-Raphaël

Feb 6, 2025, 1:32 AM

9 points

0 comments11 min readLW link

(ai-safety-atlas.com)

markov Mar 11, 2024, 1:17 PM
3 points
0
in reply to: tanagrabeast’s comment on: AI Safety 101 : Capabilities—Human Level AI, What? How? and When?
Thanks for the comment! I’ll add a sentence or a footnote for both loss, and weights in the sections you mentioned. As for forecasting in section 5.2, that claim is imagining something like
Different copies of a model can share parameter updates. For instance, ChatGPT could be deployed to millions of users, learn something from each interaction, and then propagate gradient updates to a central server where they are averaged together and applied to all copies of the model. (source)
This is slightly different from what is happening currently. Models are not undergoing online learning as far as I know. They are trained, they are deployed, and occasionally they are fine-tuned but that’s it. It is not going through continuous learning through interaction. By learning I mean the weights are not updated, but more information can still be acquired through context injections.
It is not clear yet if parallel learning would be efficient or whether the data gathered through interactions can be efficiently utilized in parallel. But given that we already use batch learning during training, it does seem possible.
Alternatively, shared knowledge can also take the form of a central ever-growing vector database like Pinecone or something. In which case compound AI systems can just learn to efficiently store and query the vector database, and inject those queries into the expanding context windows simulating improved world models.

markov Mar 8, 2024, 1:28 PM
1 point
0
in reply to: Nathan Helm-Burger’s comment on: AI Safety 101 : Capabilities—Human Level AI, What? How? and When?
Fixed. Thanks :)

AI Safety 101 : Capabilities—Human Level AI, What? How? and When?

markov and Charbel-Raphaël

Mar 7, 2024, 5:29 PM

46 points

8 comments54 min readLW link

markov Dec 24, 2023, 1:47 PM
1 point
0
in reply to: Gurkenglas’s comment on: AI Safety Chatbot
Thanks for the feedback on how to parse out feedback :)
We do have logs for everything, but as Zack pointed out we don’t currently have the processes in place to automatically recover specific inputs from the logs which were meant as feedback.

AI Safety Chatbot

markov and Robert Miles

Dec 21, 2023, 2:06 PM

61 points

11 comments4 min readLW link

markov Oct 31, 2023, 8:54 PM
1 point
0
in reply to: momom2’s comment on: AI Safety 101 : AGI
Newcomers to the AI Safety arguments might be under the impression that there will be discrete cutoffs, i.e. either we have HLAI or we dont. The point of (t,n) AGI is to give a picture of what a continuous increase in capabilities looks like. It is also slightly more formal than the simple “words based” definitions of AGI. If you know of a more precise mathematical formulation of the notion of general and super intelligences, I would love if you could point me towards it so that I can include that in the post.
As for Four Background Claims, the reason for inclusion is to provide an intuition behind why general intelligence is important. And that even though future systems might be intelligent it is not the default case that they will either care about our goals, or even follow our goals in the way as intended by the designers.

markov Oct 19, 2023, 8:52 AM
2 points
0
in reply to: mic’s comment on: AI Safety 101 : AGI
I think the point of Bio Anchors was to give a big upper bound, and not say this is exactly when it will happen. At least that is how I perceive it. People who might be at a 101 level still probably have the impression that capabilities heavy AI is like multiple decades if not centuries away. The reason I have bio anchors here, is to try to point towards the fact that we have quite likely at most until 2048. Then based on that upper bound we can scale back further.
We have the recent OpenAI report that extends bio anchors—What a compute-centric framework says about takeoff speeds (https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/). There is a comment under meta-notes that mentioned that I plan to include updates to timelines and takeoff in a future draft based on this report.

markov Oct 19, 2023, 8:40 AM
3 points
0
in reply to: TurnTrout’s comment on: AI Safety 101 : Reward Misspecification
Thanks for the feedback. I actually had an entire subsection in an earlier draft that covered Reward is not the optimization target. I decided to move it to the upcoming chapter 3 which covers optimization, goal misgen and inner alignment. I thought it would fit better as an intro section there since it ties the content back to the previous chapter, while also differentiating rewards from objectives. This flows well into differentiating which goals the system is actually pursuing.

AI Safety 101 : Reward Misspecification

markovOct 18, 2023, 8:39 PM

32 points

4 comments31 min readLW link

Is AI Safety dropping the ball on privacy?

markovSep 13, 2023, 1:07 PM

50 points

17 comments7 min readLW link

Stampy’s AI Safety Info—New Distillations #4 [July 2023]

markovAug 16, 2023, 7:03 PM

22 points

10 comments1 min readLW link

(aisafety.info)

Stampy’s AI Safety Info—New Distillations #3 [May 2023]

markovJun 6, 2023, 2:18 PM

16 points

0 comments2 min readLW link

(aisafety.info)

Stampy’s AI Safety Info—New Distillations #2 [April 2023]

markovMay 9, 2023, 1:31 PM

25 points

1 comment1 min readLW link

(aisafety.info)

Stampy’s AI Safety Info—New Distillations #1 [March 2023]

markovApr 7, 2023, 11:06 AM

42 points

0 comments2 min readLW link

(aisafety.info)

markov Mar 23, 2023, 1:45 AM
3 points
2
in reply to: mruwnik’s comment on: Is AI Safety dropping the ball on privacy?
TOR is way too slow and google hates serving content to TOR users. I2P might be faster than TOR but the current adoption is way too low. Additionally, it doesn’t help that identity persistence is a regulatory requirement in most jurisdictions because it helps traceability against identity theft, financial theft, fraud, etc… Cookie cleaning means they have to log in every time which for most people is too annoying.
I acknowledge that there are ways to technically poison existing data. The core problem though is finding things that both normal people and also technically adept (alignment researchers/engineers/...) would actually be willing to do.
The general vibe I see right now is - * shrug shoulders * they already know so I might as well just make my life convenient and continue giving them everything...
Honestly, I don’t really even think it should be the responsibility of the average consumer to have to think about this at all. Should it be your responsibility to check every part of the engine in your car when you want to drive to make sure it is not going to blow up and kill you? Of course not, that responsibility should be on the manufacturer. Similarly, the responsibility for mitigating the adverse effects of data gathering should be on the developing companies not the consumers.

markov Mar 20, 2023, 9:53 PM
3 points
0
in reply to: AnnoyedReader’s comment on: Is AI Safety dropping the ball on privacy?
I understand your original comment a lot better now. My understanding of what you said is that open source intelligence that anyone provides through their public persona is revealing more than enough information to be damaging. The little that is sent over encrypted channels is just cherries on the cake. So the only real way to avoid manipulation is to first hope that you have not been a very engaged member of the internet for the last decade, and also primarily communicate over private channels.
I suppose I just underestimated how much people actually post stuff online publicly.
One first instinct response I had was identity isolation. That was something I was going to suggest while writing the original post as well. Practicing identity isolation would mean that even if you post anything publicly the data is just isolated to that identity. Every website, every app, is either compartmentalized or is on a completely different identity. Honestly, though that requires almost perfect OPSEC to not be fingerprintable. Besides just that, it’s also way too inconvenient for people to not just use the same email, and phone number or just log in with google everywhere. So even though I would like to suggest it, no one would actually do it. And as you already mentioned most normal people have just been providing boatloads of free OSINT for the last few decades anyway...
Thinking even more, the training dataset over the entire public internet is basically the centralized database of your data that I am worried about anyway. As you mentioned AIs can find feature representations that we as humans cant. So basically even if you have been doing identity isolation, LLMs (or whatever future model) would still be able to fingerprint you as long as you have been posting enough stuff online. And not posting online is not something that most people are willing to do. Even if they are, they have already given away the game by what they have already posted. So in a majority of cases, identity isolation doesn’t help this particular problem of AI manipulation either...
I have always tried to hold the position that even if it might be possible for other people (or AIs) to do something you don’t like (violate privacy/manipulate you), that doesn’t mean you should give up or that you have to make it easy for them. But off the top of my head, without thinking about this some more I can’t really come up with any good solution for people who have been publicly publishing info on the internet for a while. Thank you for giving me food for thought.

markov Mar 20, 2023, 9:40 AM
1 point
1
in reply to: AnnoyedReader’s comment on: Is AI Safety dropping the ball on privacy?
I am trying to be as realistic as I can while realizing that privacy is inversely proportional to convenience.
So no, of course you should not stop making lesswrong posts.
The main things I suggested were—removing the ability to use data by favoring E2EE, and additionally removing the ability to hoard data, by favoring decentralized (or local) storage and computation.
As an example just favor E2EE services for collaborating instead of drive, dropbox, or office suite if you have the ability to do so. I agree that this doesn’t solve the problem but at least it gets people accustomed to thinking about using privacy-focused alternatives. So it is one step.
Another example would be using an OS which has no telemetry and gives you root access, both on your computer and on your smartphone.
There is a different class of suggestions that fall under digital hygiene in general, but as mentioned in the - ‘what is this post about’ section, that is not what this post is about. I am also intentionally avoiding naming the alternative services because I didn’t want this post to come across as a shill.
Also, this is all a question of timelines. If people think that AGI/ASI rears its head within the next years or decade, I would agree that there might be bigger fires to put out.

markov Mar 19, 2023, 11:10 AM
3 points
1
in reply to: Gyrodiot’s comment on: Is AI Safety dropping the ball on privacy?
I did consider the distinction between a model of humans vs. a model of you personally. But I can’t really see any realistic way of stopping the models from having better models of humans in general over time. So yeah, I agree with you that the small pockets of sanity are currently the best we can hope for. It was mainly to spread the pocket of sanity from infosec to the alignment space is why I wrote up this post. Because I would consider the minds of alignment researchers to be critical assets.
As to why predictive models of humans in general seems unstoppable—I thought it might be too much to ask to not even provide anonymized data because there are a lot of good capabilities that are enabled by that (e.g. better medical diagnoses). Even if it is not too heavy of a capability loss most people would still provide data because they simply don’t care or remain unaware. Which is why I used the wording—stem the flow of data and delay timelines instead of stopping the flow.

markov Mar 19, 2023, 7:48 AM
4 points
0
in reply to: gwern’s comment on: Is AI Safety dropping the ball on privacy?
Thanks for pointing that out! It’s embarrassing that I made a mistake, but it’s also relieving in some sense to learn that the impacts were not as I had thought them to be.
I hope this error doesn’t serve to invalidate the entire post. I don’t really know what the post-publishing editing etiquette is, but I don’t want to keep anything in the post that might serve as misinformation so I’ll edit this line out.
Please let me know if there are any other flaws you find and I’ll get them fixed.