Seth Herd comments on Self-censoring on AI x-risk discussions?

Seth Herd 2 Jul 2024 5:27 UTC
4 points
2
These are reasonable concerns. I’m glad to see list points 2, 3, and 4. Playing defense to the exclusion of playing offense is a way to lose. We are here to win. That requires courage and discernment in the face of danger. Avoiding all downsides is a pretty sure way to reduce your odds of winning. There are many posts here about weighing the dowsides of sharing infohazards with the upsides of progressing on alignment; sorry I don’t have refs at the top of my head. I will say that I, and I think the community on average, would say it’s highly unlikely that existentially dangerous AI exists now, or that future AI will become more dangerous by reading particularly clever human ideas.

So, if your ideas have potential important upside, and no obvious large downside, please share them.

Some of the points in posts about sharing ideas: it’s unlikely they’re as dangerous as you think; others have probably had those ideas and maybe tested them if they’re that good. And you need to weigh the up-side against the down-side.

Also if you have a little time, searching LessWrong for similar ideas will be fun and fascinating.

Different personalities will tend to misweight in opposite directions. Pessimists/anxious people will overweight potential downsides, while optimists/enthusiastic people will overweight the upside. Doing a good job is complex. But it probably doesn’t matter much unless you’ve done a bunch of research to establish that your new idea is really both new and potentially very powerful. Lots of ideas (but not nearly all!) have been thought of and explored, and there are lots of reasons that powerful-seeming ideas wind up not being that important or new.
- Decaeneus 2 Jul 2024 14:30 UTC
  1 point
  0
  Parent
  So, if your ideas have potential important upside, and no obvious large downside, please share them.
  What would be some examples of obviously large downside? Something that comes to mind is anything that tips the current scales in a bad way, like some novel research result that directs researchers to more rapid capabilities increase without a commensurate increase in alignemnt. Anything else?