Ben Millwood

Karma: 45

Ben Millwood Jan 22, 2025, 10:38 AM
5 points
3
on: Ben Millwood’s Shortform
Given ambiguity about whether GitHub trains models on private repos, I wonder if there’s demand for someone to host a public GitLab (or similar) instance that forbids training models on their repos, and takes appropriate countermeasures against training data web scrapers accessing their public content.

Ben Millwood Dec 11, 2024, 8:37 PM
3 points
0
in reply to: Yonatan Cale’s comment on: Should we exclude alignment research from LLM training datasets?

how about a robots.txt?

Yeah, that’s a strong option, which is why I went around checking + linking all the robots.txt files for the websites I listed above :)

In my other post I discuss the tradeoffs of the different approaches one in particular is that it would be somewhat clumsy to implement post-by-post filters via robots.txt, whereas user-agent filtering can do it just fine.

Ben Millwood Dec 10, 2024, 11:50 PM
1 point
0
in reply to: Yonatan Cale’s comment on: Should we exclude alignment research from LLM training datasets?
I think there’s two levels of potential protection here. One is a security-like “LLMs must not see this” condition, for which yes, you need to do something that would keep out a human too (though in practice maybe “post only visible to logged-in users” is good enough).

However I also think there’s a lower level of protection that’s more like “if you give me the choice, on balance I’d prefer for LLMs not to be trained on this”, where some failures are OK and imperfect filtering is better than no filtering. The advantage of targeting this level is simply that it’s much easier and less obtrusive, so you can do it at a greater scale with a lower cost. I think this is still worth something.

Ben Millwood Dec 10, 2024, 11:40 PM
1 point
0
in reply to: Marius Hobbhahn’s comment on: Frontier Models are Capable of In-context Scheming
in case anyone else wanted to look this up, it’s at https://axrp.net/episode/2024/12/01/episode-39-evan-hubinger-model-organisms-misalignment.html#training-away-sycophancy-subterfuge

FWIW I also tried to start a discussion on some aspects of this at https://www.lesswrong.com/posts/yWdmf2bRXJtqkfSro/should-we-exclude-alignment-research-from-llm-training but it didn’t get a lot of eyeballs at the time.

Ben Millwood Oct 22, 2024, 12:27 PM
1 point
0
in reply to: Thomas Kwa’s comment on: If far-UV is so great, why isn’t it everywhere?
That slides presentation presents me with a “you need access” screen. Is it OK to be public?

Ben Millwood Oct 6, 2024, 12:25 PM
1 point
0
in reply to: DanielFilan’s comment on: DanielFilan’s Shortform Feed
You can’t always use liability to internalise all the externality because e.g. you can’t effectively sue companies for more than they have, and for some companies that stay afloat by regular fundraising rounds, that may not even be that much? like, if they’re considering an action that is a coinflip between “we cause some huge liability” and “we triple the value of our company” then it’s usually going to be correct from a shareholder perspective to take it, no matter the size of the liability, right?

Criminal law has the ability to increase the deterrent somewhat – probably many people will not accept any amount of money for a significant enough chance of prison – though obviously it’s not perfect either

Ben Millwood Aug 9, 2024, 10:01 AM
4 points
2
on: Leaving MIRI, Seeking Funding
I heard on the grapevine (this PirateSoftware YouTube Short) that Ko-fi offers a similar service to Patreon but cheaper, curious if you prefer Patreon or just weren’t aware of Ko-fi

edit: I think the prices in the short are not accurate (maybe outdated?) but I’d guess it still works out cheaper

Ben Millwood Aug 7, 2024, 12:46 PM
7 points
4
in reply to: basil.halperin’s comment on: How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
Even in real-money prediction markets, “how much real money?” is a crucial question for deciding whether to trust the market or not. If you had a tonne of questions and no easy way to find the “forgotten” markets, and each market has (say) 10s-100s of dollars of orders on the book, then the people skilled enough to do the work likely have better ways to turn their time (and capital) into money. For example, I think some of the more niche Betfair markets are probably not worth taking particularly seriously.

Ben Millwood Jul 27, 2024, 9:36 AM
4 points
3
in reply to: Adam Zerner’s comment on: Universal Basic Income and Poverty

I don’t think it’s reasonable to expect very much from the author, and so I lean away from viewing the lack of citations as something that (meaningfully) weakens the post.

I feel like our expectations of the author and the circumstances of the authorship can inform our opinions of how “blameworthy” the author is for not improving the post in some way, but shouldn’t really have any relevance to what changes would be improvements if they occurred. The latter seems to me to purely be a claim about the text of the post, not a claim about the process that wrote it.

Ben Millwood Jul 22, 2024, 9:48 PM
11 points
4
on: Ben Millwood’s Shortform
xAI has ambitions to compete with OpenAI and DeepMind, but I don’t feel like it has the same presence in the AI safety discourse. I don’t know anything about its attitude to safety, or how serious a competitor it is. Are there good reasons it doesn’t get talked about? Should we be paying it more attention?

Ben Millwood Jul 18, 2024, 1:21 PM
4 points
1
in reply to: Ben Pace’s comment on: Against Aschenbrenner: How ‘Situational Awareness’ constructs a narrative that undermines safety and threatens humanity
For my taste, the apostrophe in “you’re” is not confusing because quotations can usually only end on word boundaries.

I think (though not confidently) that any attempt to introduce specific semantics to double vs. single quotes is doomed, though. Such conventions probably won’t reach enough adoption that you’ll be able to depend on people adhering to or understanding them.

(My convention is that double quotes and single quotes mean the same thing, and you should generally make separately clear if you’re not literally quoting someone. I mostly only use single quotes for nesting inside double quotes, although the thing I said above about quote marks only occurring on word boundaries make this a redundant clarification.)

[Question] Should we exclude alignment research from LLM training datasets?

Ben MillwoodJul 18, 2024, 10:27 AM

3 points

5 comments1 min readLW link

Keeping content out of LLM training datasets

Ben MillwoodJul 18, 2024, 10:27 AM

3 points

0 comments5 min readLW link

Ben Millwood Jul 16, 2024, 5:12 PM
3 points
0
in reply to: niplav’s comment on: niplav’s Shortform
When I asked, Claude hadn’t even heard of the ARC evals canary, though I didn’t try very hard. (I nevertheless filled out the contact form on the canary webpage to inform them of this thread.)

Ben Millwood Jul 15, 2024, 10:10 PM
1 point
0
in reply to: Dagon’s comment on: Ben Millwood’s Shortform

In markets like these, “cap losses” is equivalent to “cap wins”—the actual money is zero-sum, right?

Overall, yes, per-participant no. For example, if everyone caps their loss at $1 I can still win $10 by betting against ten different people, though of course only at most 1 in 11 market participants will be able to do this.

There certainly exist wagers that scale ($10 per point difference, on a sporting event, for instance), and a LOT of financial investing has this structure (stocks have no theoretical maximum value).

Yeah, although the prototypical prediction market has contracts with two possible valuations, even existing prediction markets also support contracts that settle to a specific value. The thing that felt new to me about the idea I had was that you could have prediction contracts that pay out at times other than the end of their life, though it’s unclear to me whether this is actually more expressive than packaged portfolios of binary, payout-then-disappear contracts.

(Portfolios of derivatives that can be traded atomically are nontrivially more useful than only being able to trade one “leg” at a time, and are another thing that exist in traditional finance but mostly don’t exist in prediction markets. My impression there, though, is that these composite derivatives are often just a marketing ploy by banks to sell clients things that are tricky to price accurately, so they can hide a bigger markup on them; I’m not sure a more co-operative market would bother with them.)

Ben Millwood Jul 15, 2024, 2:42 PM
1 point
0
on: Ben Millwood’s Shortform
I wonder if anyone has considered or built prediction markets that can pay out repeatedly: an example could be “people who fill in this feedback form will say that they would recommend the event to others”, and each response that says yes causes shorts to pay longs (or noes pay yesses) and vice versa.

You’d need some mechanism to cap losses. I guess one way to model it is as a series of markets of the form “the Nth response will say yes”, and a convenient interface to trade in the first N markets at a single price. That way, after a few payouts your exposure automatically closes. That said, it might make more sense to close out after a specified number of losses, rather than a specified number of resolutions (i.e. no reason to cap the upside) but it’s less clear to me whether that structure has any hidden complexity.

The advantages over a single market that resolves to a percentage of yesses are probably pretty marginal? Most significant where there isn’t going to be an obvious end time, but I don’t have any examples of that immediately.

In general there’s a big space of functions from consequences to payouts. Most of them probably don’t make good “products”, but maybe more do than are currently explored.

Ben Millwood’s Shortform

Ben MillwoodJul 15, 2024, 2:42 PM

1 point

10 comments LW link

Ben Millwood Jul 13, 2024, 11:29 AM
1 point
0
in reply to: niplav’s comment on: niplav’s Shortform
Now that this quick take exists, LLMs can reproduce the string by reading LessWrong, even if the benchmark data is correctly filtered, right?

I suppose it depends on whether the intended meaning is “we should filter these sources of benchmarks, and to make sure we’ve done that, we will put the canary in the benchmarks”, in which case we need to be careful to avoid the canary being anywhere else, or “we should filter anything that contains the canary, no matter what it is”. Does anyone know if the intended mechanism has been discussed?

edit: from the docs on GitHub, it really sounds like the guid needs to cause any document it’s in to be excluded from training in order to be effective.

Ben Millwood Jul 12, 2024, 6:38 PM
4 points
1
in reply to: rossry’s comment on: Poker is a bad game for teaching epistemics. Figgie is a better one.
right, and as further small optimisations:
1. you can just remove 1 card from each suit permanently before playing, leaving 0 / 2 / 2 / 4 to remove each game
2. you don’t need to split the entire deck into suits, just make 4 piles of 4 cards from each suit and remove from those (though I guess in practice the game often separates cards into suits anyway, so maybe this doesn’t matter)

Ben Millwood Jul 12, 2024, 11:14 AM
2 points
1
in reply to: Max Entropy’s comment on: Poker is a bad game for teaching epistemics. Figgie is a better one.

It also requires N custom-made decks for N rounds of play

You can make a figgie deck by taking a normal deck of cards and removing 1, 3, 3, and 5 cards from randomly-selected suits. This is easiest to do if you have a non-participant dealer, but if you don’t, it’s not too hard to come up with a protocol that allows you to do this as part of shuffling without anyone knowing the results.

Ben Millwood

[Question] Should we ex­clude al­ign­ment re­search from LLM train­ing datasets?

Keep­ing con­tent out of LLM train­ing datasets

Ben Mill­wood’s Shortform

[Question] Should we exclude alignment research from LLM training datasets?

Keeping content out of LLM training datasets

Ben Millwood’s Shortform