Hello! I work at Lightcone and like LessWrong :-). I have made some confidentiality agreements I can’t leak much metadata about (like who they are with). I have made no non-disparagement agreements.
kave
I’m curious about Claude’s coauthorship of this post. How did it help?
The crucial point is: when the untrusted model gets to step 6, it sees R in the history instead of A, so it knows its attempt was replaced.
Why does it know its attempt was replaced? Because it can guess it wouldn’t have produced R? If it can tell that, how come it can’t guess it specifically would have produced A?
I thought this title meant the post would be making a case from conservative (i.e. minimal) assumptions.
Maybe change the title to “making a politically conservative case for alignment” or something?
I wonder what the lifetime spend on dating apps is. I expect that for most people who ever pay it’s >$100
I think the credit assignment is legit hard, rather than just being a case of bad norms. Do you disagree?
I would guess they tried it because they hoped it would be competitive with their other product, and sunset it because that didn’t happen with the amount of energy they wanted to allocate to the bet. There may also have been an element of updating more about how much focus their core product needed.
I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.
It’s possible no one tried literally “recreate OkC”, but I think dating startups are very oversubscribed by founders, relative to interest from VCs [1] [2] [3] (and I think VCs are mostly correct that they won’t make money [4] [5]).
(Edit: I want to note that those are things I found after a bit of googling to see if my sense of the consensus was borne out; they are meant in the spirit of “several samples of weak evidence”)
I don’t particularly believe you that OkC solves dating for a significant fraction of people. IIRC, a previous time we talked about this, @romeostevensit suggested you had not sufficiently internalised the OkCupid blog findings about how much people prioritised physical attraction.
You mention manifold.love, but also mention it’s in maintenance mode – I think because the type of business you want people to build does not in fact work.
I think it’s fine to lament our lack of good mechanisms for public good provision, and claim our society is failing at that. But I think you’re trying to draw an update that’s something like “tech startups should be doing an unbiased search through viable valuable business, but they’re clearly not”, or maybe, “tech startups are supposed to be able to solve a large fraction of our problems, but if they can’t solve this, then that’s not true”, and I don’t think either of these conclusions seem that licensed from the dating data point.
Yes, though I’m not confident.
I saw this poll and thought to myself “gosh, politics, religion and cultural opinions sure are areas where I actively try to be non-heroic, as they aren’t where I wish to spend my energy”.
They load it in as a web font (i.e. you load Calibri from their server when you load that search page). We don’t do that on LessWrong
Yeah, that’s a google Easter Egg. You can also try “Comic Sans” or “Trebuchet MS”.
One sad thing about older versions of Gill Sans: Il1 all look the same. Nova at least distinguishes the 1.
IMO, we should probably move towards system fonts, though I would like to choose something that preserves character a little more.
I don’t think we’ve changed how often we use serifs vs sans serifs. Is there anything particular you’re thinking of?
@gwern I think it prolly makes sense for me to assign this post to your account? Let me know if you’re OK with that.
Gwern: Why So Few Matt Levines?
For me, Dark Forest Theory reads strongly as “everyone is hiding, (because) everyone is hunting”, rather than just “everyone is hiding”.
From the related book Elephant in the Brain:
Here is the thesis we’ll be exploring in this book: We, human beings, are a species that’s not only capable of acting on hidden motives—we’re designed to do it. Our brains are built to act in our self-interest while at the same time trying hard not to appear selfish in front of other people. And in order to throw them off the trail, our brains often keep “us,” our conscious minds, in the dark. The less we know of our own ugly motives, the easier it is to hide them from others.
I think Steve Hsu has written some about the evidence for additivity on his blog (Information Processing). He also talks about it a bit in section 3.1 of this paper.
It seems like there’s a general principle here, that it’s hard to use pure empiricism to bound behaviour over large input and action spaces. You either need to design the behaviour, or understand it mechanistically.
I don’t think this distinction is robust enough to rely on as much of a defensive property. I think it’s probably not that hard to think “I probably would have tried something in direction X, or direction Y”, and then gather lots of bits about how well the clusters X and Y work.