I am excited about this. I’ve also recently been interested in ideas like nudge researchers to write 1-5 page research agendas, then collect them and advertise the collection.
Possible formats:
A huge google doc (maybe based on this post); anyone can comment; there’s one or more maintainers; maintainers approve ~all suggestions by researchers about their own research topics and consider suggestions by random people.
A directory of google docs on particular agendas; the individual google docs are each owned by a relevant researcher, who is responsible for maintaining them; some maintainer-of-the-whole-project occasionally nudges researchers to update their docs and reassigns the topic to someone else if necessary. Random people can make suggestions too.
(Alex, I think we can do much better than the best textbooks format in terms of organization, readability, and keeping up to date.)
I am interested in helping make something like this happen. Or if it doesn’t happen soon I might try to do it (but I’m not taking responsibility for making this happen). Very interested in suggestions.
(One particular kind-of-suggestion: is there a taxonomy/tree of alignment research directions you like, other than the one in this post? (Note to self: taxonomies have to focus on either methodology or theory of change… probably organize by theory of change and don’t hesitate to point to the same directions/methodologies/artifacts in multiple places.))
There’s also a much harder and less impartial option, which is to have an extremely opinionated survey that basically picks one lens to view the entire field and then describes all agendas with respect to that lens in terms of which particular cruxes/assumptions each agenda runs with. This would necessarily require the authors of the survey to deeply understand all the agendas they’re covering, and inevitably some agendas will receive much more coverage than other agendas.
This makes it much harder than just stapling together a bunch of people’s descriptions of their own research agendas, and will never be “the” alignment survey because of the opinionatedness. I still think this would have a lot of value though: it would make it much easier to translate ideas between different lenses/notice commonalities, and help with figuring out which cruxes need to be resolved for people to agree.
Relatedly, I don’t think alignment currently has a lack of different lenses (which is not to say that the different lenses are meaningfully decorrelated). I think alignment has a lack of convergence between people with different lenses. Some of this is because many cruxes are very hard to resolve experimentally today. However, I think even despite that it should be possible to do much better than we currently are—often, it’s not even clear what the cruxes are between different views, or whether two people are thinking about the same thing when they make claims in different language.
I strongly agree that this would be valuable; if not for the existence of this shallow review I’d consider doing this myself just to serve as a reference for myself.
Fwiw I think “deep” reviews serve a very different purpose from shallow reviews so I don’t think you should let the existence of shallow reviews prevent you from doing a deep review
I’ve written up an opinionated take on someone else’s technical alignment agenda about three times, and each of those took me something like 100 hours. That was just to clearly state why I disagreed with it; forget about resolving our differences :)
LessWrong does have a relatively fully featured wiki system. Not sure how good of a fit it is, but like, everyone can create tags and edit them and there are edit histories and comment sections for tags and so on.
We’ve been considering adding the ability for people to also add generic wiki pages, though how to make them visible and allocate attention to them has been a bit unclear.
how to make them visible and allocate attention to them has been a bit unclear
Maybe an opt-in/opt-out “novice mode” which turns, say, the first appearance of a niche LW term in every post into a link to that term’s LW wiki page? Which you can turn off in the settings, and which is either on by default (with a notification on how to turn it off), or the sign-up process queries you about whether you want to turn it on, or something along these lines.
Alternatively, a button for each post which fetches the list of idiosyncratic LW terms mentioned in it, and links to their LW wiki pages?
large-scale hybrid (humans and AIs) society and economy
AI lab, not to be confused with an “AI org” above: an AI lab is an org composed of humans and increasingly of AIs that creates advanced AI systems. See Hendrycks et al.′ discussion of organisational risks.
I am excited about this. I’ve also recently been interested in ideas like nudge researchers to write 1-5 page research agendas, then collect them and advertise the collection.
Possible formats:
A huge google doc (maybe based on this post); anyone can comment; there’s one or more maintainers; maintainers approve ~all suggestions by researchers about their own research topics and consider suggestions by random people.
A directory of google docs on particular agendas; the individual google docs are each owned by a relevant researcher, who is responsible for maintaining them; some maintainer-of-the-whole-project occasionally nudges researchers to update their docs and reassigns the topic to someone else if necessary. Random people can make suggestions too.
(Alex, I think we can do much better than the best textbooks format in terms of organization, readability, and keeping up to date.)
I am interested in helping make something like this happen. Or if it doesn’t happen soon I might try to do it (but I’m not taking responsibility for making this happen). Very interested in suggestions.
(One particular kind-of-suggestion: is there a taxonomy/tree of alignment research directions you like, other than the one in this post? (Note to self: taxonomies have to focus on either methodology or theory of change… probably organize by theory of change and don’t hesitate to point to the same directions/methodologies/artifacts in multiple places.))There’s also a much harder and less impartial option, which is to have an extremely opinionated survey that basically picks one lens to view the entire field and then describes all agendas with respect to that lens in terms of which particular cruxes/assumptions each agenda runs with. This would necessarily require the authors of the survey to deeply understand all the agendas they’re covering, and inevitably some agendas will receive much more coverage than other agendas.
This makes it much harder than just stapling together a bunch of people’s descriptions of their own research agendas, and will never be “the” alignment survey because of the opinionatedness. I still think this would have a lot of value though: it would make it much easier to translate ideas between different lenses/notice commonalities, and help with figuring out which cruxes need to be resolved for people to agree.
Relatedly, I don’t think alignment currently has a lack of different lenses (which is not to say that the different lenses are meaningfully decorrelated). I think alignment has a lack of convergence between people with different lenses. Some of this is because many cruxes are very hard to resolve experimentally today. However, I think even despite that it should be possible to do much better than we currently are—often, it’s not even clear what the cruxes are between different views, or whether two people are thinking about the same thing when they make claims in different language.
I strongly agree that this would be valuable; if not for the existence of this shallow review I’d consider doing this myself just to serve as a reference for myself.
Fwiw I think “deep” reviews serve a very different purpose from shallow reviews so I don’t think you should let the existence of shallow reviews prevent you from doing a deep review
I’ve written up an opinionated take on someone else’s technical alignment agenda about three times, and each of those took me something like 100 hours. That was just to clearly state why I disagreed with it; forget about resolving our differences :)
Even that is putting it a bit too lightly.
i.e. Is there even a single, bonafide, novel proof at all?
Proven mathematically, or otherwise demonstrated with 100% certainty, across the last 10+ years.
Or is it all just ‘lenses’, subjective views, probabilistic analysis, etc...?
LessWrong does have a relatively fully featured wiki system. Not sure how good of a fit it is, but like, everyone can create tags and edit them and there are edit histories and comment sections for tags and so on.
We’ve been considering adding the ability for people to also add generic wiki pages, though how to make them visible and allocate attention to them has been a bit unclear.
Maybe an opt-in/opt-out “novice mode” which turns, say, the first appearance of a niche LW term in every post into a link to that term’s LW wiki page? Which you can turn off in the settings, and which is either on by default (with a notification on how to turn it off), or the sign-up process queries you about whether you want to turn it on, or something along these lines.
Alternatively, a button for each post which fetches the list of idiosyncratic LW terms mentioned in it, and links to their LW wiki pages?
I’ve earlier suggested a principled taxonomy of AI safety work with two dimensions:
System level:
monolithic AI system
human—AI pair
AI group/org: CoEm, debate systems
large-scale hybrid (humans and AIs) society and economy
AI lab, not to be confused with an “AI org” above: an AI lab is an org composed of humans and increasingly of AIs that creates advanced AI systems. See Hendrycks et al.′ discussion of organisational risks.
Methodological time:
design time: basic research, math, science of agency (cognition, DL, games, cooperation, organisations), algorithms
manufacturing/training time: RLHF, curriculums, mech interp, ontology/representations engineering, evals, training-time probes and anomaly detection
deployment/operations time: architecture to prevent LLM misuse or jailbreaking, monitoring, weights security
evolutionary time: economic and societal incentives, effects of AI on society and psychology, governance.
So, this taxonomy is a 5x4 matrix, almost all slots or which are interesting, and some of them are severely under-explored.
Hi, we’ve already made a site which does this!