Yep, those are extremely different drugs with very different effects. They do in fact take a while and the effects can be much more subtle.
evand
having read a few books about ADHD, my best guess is that I have (and had) a moderate case of it.
when I’m behind the wheel of a car and X = “the stoplight in front of me” and Y = “that new Indian restaurant”, it is bad
Which is one of the reasons why I don’t drive.
If your ADHD is interfering with your driving, that does not sound like a moderate case!
But option #3 is much better: take medication for six weeks and see what happens.
My expectation is that you will likely get a lot of information from trying the meds for 6 days; 6 weeks sounds like a very long experiment. Quite possibly even 2-3 days or just 1 day. 6 weeks sounds like enough time to test out a few doses and types (time release vs not, for example) and form opinions. And possibly get an understanding of whether you want to take it every day or only some days and maybe even how to figure out which is which.
All of which is to say: yes, perform cheap experiments! They’re great! This one is probably far faster (if not much cheaper in dollar terms) than you’re predicting.
Bitcoin Hivemind (nee Truthcoin) is the authority on doing this in a truly decentralized fashion. The original whitepaper is well worth a read. The fundamental insight: it’s easier to coordinate on the truth than on something else; incentivizing defection to the truth works well.
Judges have reputation. If you judge against the consensus, you lose reputation, and the other (consensus) judges gain it. The amount you lose depends on the consensus: being the lone dissenter has a very small cost (mistakes have costs, but small ones), but being part of a near-majority is very costly. So if your conspiracy is almost but not quite winning, the gain from defecting against the conspiracy is very high.
Adaptations to a non-blockchain context should be fairly easy. In that case, you have a central algorithmic resolver that tracks reputation, but defers to its users for resolution of a particular market (and applies the reputation algorithm).
Thank you!
I went ahead and created the 2024 version of one of the questions. If you’re looking for high-liquidity questions to include, which seems like a good way to avoid false alarms / pranks, this one seems like a good inclusion.
There are a bunch of lower-liquidity questions; including a mix of those with some majority-rule type logic might or might not be worth it.
Thank you! Much to think about, but later...
If there are a large number of true-but-not-publicly-proven statements, does that impose a large computational cost on the market making mechanism?
I expect that the computers running this system might have to be fairly beefy, but they’re only checking proofs.
They’re not, though. They’re making markets on all the interrelated statements. How do they know when they’re done exhausting the standing limit orders and AMM liquidity pools? My working assumption is that this is equivalent to a full Bayesian network and explodes exponentially for all the same reasons. In practice it’s not maximally intractable, but you don’t avoid the exponential explosion either—it’s just slower than the theoretical worst case.
If every new order placed has to be checked against the limit orders on every existing market, you have a problem.
For thickly traded propositions, I can make money by investing in a proposition first, then publishing a proof. That sends the price to $1 and I can make money off the difference. Usually, it would be more lucrative to keep my proof secret, though.
The problem I’m imaging comes when the market is trading at .999, but life would really be simplified for the market maker if it was at actually, provably 1. So it could stop tracking that price as something interesting, and stop worrying about the combinatorial explosion.
So you’d really like to find a world where once everyone has bothered to run the SAT-solver trick and figure out what route someone is minting free shares through, that just becomes common knowledge and everyone’s computational costs stop growing exponentially in that particular direction. And furthermore, the first person to figure out the exact route is actually rewarded for publishing it, rather than being able to extract money at slowly declining rates of return.
In other words: at what point does a random observer start turning “probably true, the market said so” into “definitely true, I can download the Coq proof”? And after that point, is the market maker still pretending to be ignorant?
This is very neat work, thank you. One of those delightful things that seems obvious in retrospect, but that I’ve never seen expressed like this before. A few questions, or maybe implementation details that aren’t obvious:
For complicated proofs, the fully formally verified statement all the way back to axioms might be very long. In practice, do we end up with markets for all of those? Do they each need liquidity from an automated market maker? Presumably not if you’re starting from axioms and building a full proof, and that applies to implications and conjunctions and so on as well, because the market doesn’t need to keep tracking things that are proven. However:
First Alice, who can prove , produces many many shares of for free. This is doable if you have a proof for by starting from a bunch of free shares and using equivalent exchange. She sells these for $0.2 each to Bob, pure profit.
In order for this to work, the market must be willing to maintain a price for these shares in the face of a proof that they’re equivalent to . Presumably the proof is not yet public, and if Alice has secret knowledge she can sell with a profit-maximizing strategy.
She could simply not provide the proof to the exchange, generating and pairs and selling only the latter, equivalent to just investing in A, but that requires capital. It’s far more interesting if she can do it without tying up the capital.
So how does the market work for shares of proven things, and how does the proof eventually become public? Is there any way to incentivize publishing proofs, or do we simply get a weird world where everyone is pretty sure some things are true but the only “proof” is the market price?
If there are a large number of true-but-not-publicly-proven statements, does that impose a large computational cost on the market making mechanism?
Second question: how does this work in different axiom systems? Do we need separate markets, or can they be tied together well? How does the market deal with “provable from ZFC but not Peano”? “Theorem X implies corollary Y” is a thing we can prove, and if there’s a price on shares of “Theorem X” then that makes perfect sense, but does it make sense to put a “price” on the “truth” of the ZFC axioms?
Presumably if we have a functional market that distinguishes Peano proofs from ZFC proofs, we’d like to distinguish more axiom sets. What happens if someone sets up an inconsistent axiom set, and that inconsistency is found? Presumably all dependent markets become a mess and there’s a race to the exits that extracts all the liquidity from the AMMs; that seems basically fine. But can that be contained to only those markets, without causing weird problems in Peano-only markets?
Probably some of this would be clearer if I knew a bit more about modern proof formalisms.
My background: educated amateur. I can design simple to not-quite-simple analog circuits and have taken ordinary but fiddly material property measurements with electronics test equipment and gotten industrially-useful results.
One person alleges an online rumor that poorly connected electrical leads can produce the same graph. Is that a conventional view?
I’m not seeing it. With a bad enough setup, poor technique can do almost anything. I’m not seeing the authors as that awful, though. I don’t think they’re immune from mistakes, but I give low odds on the arbitrarily-awful end of mistakes.
You can model electrical mistakes as some mix of resistors and switches. Fiddly loose contacts are switches, actuated by forces. Those can be magnetic, thermal expansion, unknown gremlins, etc. So “critical magnetic field” could be “magnetic field adequate to move the thing”. Ditto temperature. But managing both problems at the same time in a way that looks like a plausible superconductor critical curve is… weird. The gremlins could be anything, but gremlins highly correlated with interesting properties demand explanation.
Materials with grains can have conducting and not-conducting regions. Those would likely have different thermal expansion behaviors. Complex oxides with grain boundaries are ripe for diode-like behavior. So you could have a fairly complex circuit with fairly complex temperature dependence.
I think this piece basically comes down to two things:
Can you get this level of complex behavior out of a simple model? One curve I’d believe, but the multiple curves with the relationship between temperature and critical current don’t seem right. The level of mistake to produce this seems complicated, with very low base rate.
Did they manage to demonstrate resistivity low enough to rule out simple conduction in the zero-voltage regime? (For example, lower resistivity than copper by an order of magnitude.) The papers are remarkably short on details to this effect. They claim yes, but details are hard to come by. (Copper has resistivity ~ 1.7e-6 ohm*cm, they claim < 10^-10 in the 3-author paper for the thin-film sample, but details are in short supply.) Four point probe technique to measure the resistivity of copper in a bulk sample is remarkably challenging. You measure the resistivity of copper with thin films or long thin wires if you want good data. I’d love to see more here.
If the noise floor doesn’t rule out copper, you can get the curves with adequately well chosen thermal and magnetic switches from loose contacts. But there are enough graphs that those errors have to be remarkably precisely targeted, if the graphs aren’t fraud.
Another thing I’d love to see on this front: multiple graphs of the same sort from the same sample (take it apart and put it back together), from different locations on the sample, from multiple samples. Bad measurement setups don’t repeat cleanly.
My question for the NO side: what does the schematic of the bad measurement look like? Where do you put the diodes? How do you manage the sharp transition out of the zero-resistance regime without arbitrarily-fine-tuned switches?
Do any other results from the 6-person or journal-submitted LK papers stand out as having the property, “This is either superconductivity or fraud?”
The field-cooled vs zero-field-cooled magnetization graph (1d in the 3-author paper, 4a in the 6-author paper). I’m far less confident in this than the above; I understand the physics much less well. I mostly mention it because it seems under-discussed from what I’ve seen on twitter and such. This is an extremely specific form of thermal/magnetic hysteresis that I don’t know of an alternate explanation for. I suspect this says more about my ignorance than anything else, but I’m surprised I haven’t seen a proposed explanation from the NO camp.
The comparison between the calculations saying igniting the atmosphere was impossible and the catastrophic mistake on Castle Bravo is apposite as the initial calculations for both were done by the same people at the same gathering!
One out of two isn’t bad, right?
https://twitter.com/tobyordoxford/status/1659659089658388545
Of course a superintelligence could read your keys off your computer’s power light, if it found it worthwhile. Most of the time it would not need to, it would find easier ways to do whatever humans do by pressing keys. Or make the human press the keys.
FYI, the referenced thing is not about what keys are being pressed on a keyboard, it’s about extracting the secret keys used for encryption or authentication. You’re using the wrong meaning of “keys”.
If you think the true likelihood is 10%, and are being offered odds of 50:1 on the bet, then the Kelly Criterion suggests you should be about 8% of your bankroll. For various reasons (mostly human fallibility and an asymmetry in the curve of the Kelly utility), lots of people recommend betting at fractions of the Kelly amount. So someone in the position you suggest might reasonably wish to be something like $2-5k per $100k of bankroll. That strategy, your proposed credences, and the behavior observed so far would imply a bankroll of a few hundred thousand dollars. That’s not trivial, but also far from implausible in this community.
I’d also guess that the proper accounting of the spending here is partly on the bet for positive expected value, and partly on some sort of marketing / pushing for higher credibility of their idea sort of thing. I’m not sure of the exact mechanism or goal, and this is not a confident prediction, but it has that feel to it.
“Now” is the time at which you can make interventions. Subjective experience lines up with that because it can’t be casually compatible with being in the future, and it maximizes the info available to make the decision with. Or rather, approximately maximizes subject to processing constraints: things get weird if you start really trying to ask whether “now” is “now” or “100 ms ago”.
That’s sort of an answer that seems like it depends on a concept of free will, though. To which my personal favorite response is… how good is your understanding of counterfactuals? Have you written a program that tries to play a two-player game, like checkers or go? If you have, you’ll discover that your program is completely deterministic, yet has concepts like “now” and “if I choose X instead of Y” and they all just work.
Build an intuitive understanding of how that program works, and how it has both a self-model and understanding of counterfactuals while being deterministic in a very limited domain, and you’ll be well under way to dissolving this confusion. (Or at least, I’ve spent a bunch of hours on such programs and I find the analogy super useful; YMMV and I’m probably typical-minding too much here.)
My concern with conflating those two definitions of alignment is largely with the degree of reliability that’s relevant.
The definition “does what the developer wanted” seems like it could cash out as something like “x% of the responses are good”. So, if 99.7% of responses are “good”, it’s “99.7% aligned”. You could even strengthen that as something like “99.7% aligned against adversarial prompting”.
On the other hand, from a safety perspective, the relevant metric is something more like “probabilistic confidence that it’s aligned against any input”. So “99.7% aligned” means something more like “99.7% chance that it will always be safe, regardless of who provides the inputs, how many inputs they provide, and how adversarial they are”.
In the former case, that sounds like a horrifyingly low number. What do you mean we only get to ask the AI 300 things in total before everyone dies? How is that possibly a good situation to be in? But in the latter case, I would roll those dice in a heartbeat if I could be convinced the odds were justified.
So anyway, I still object to using the “alignment” term to cover both situations.
If there are reasons to refuse bets in general, that apply to the LessWrong community in aggregate, something has gone horribly horribly wrong.
No one is requiring you personally to participate, and I doubt anyone here is going to judge you for reluctance to engage in bets with people from the Internet who you don’t know. Certainly I wouldn’t. But if no one took up this bet, it would have a meaningful impact on my view of the community as a whole.
I don’t know how it prevents us from dying either! I don’t have a plan that accomplishes that; I don’t think anyone else does either. If I did, I promise I’d be trying to explain it.
That said, I think there are pieces of plans that might help buy time, or might combine with other pieces to do something more useful. For example, we could implement regulations that take effect above a certain model size or training effort. Or that prevent putting too many flops worth of compute in one tightly-coupled cluster.
One problem with implementing those regulations is that there’s disagreement about whether they would help. But that’s not the only problem. Other problems are things like: how hard would they be to comply with and audit compliance with? Is compliance even possible in an open-source setting? Will those open questions get used as excuses to oppose them by people who actually object for other reasons?
And then there’s the policy question of how we move from the no-regulations world of today to a world with useful regulations, assuming that’s a useful move. So the question I’m trying to attack is: what’s the next step in that plan? Maybe we don’t know because we don’t know what the complete plan is or whether the later steps can work at all, but are there things that look likely to be useful next steps that we can implement today?
One set of answers to that starts with voluntary compliance. Signing an open letter creates common knowledge that people think there’s a problem. Widespread voluntary compliance provides common knowledge that people agree on a next step. But before the former can happen, someone has to write the letter and circulate it and coordinate getting signatures. And before the latter can happen, someone has to write the tools.
So a solutionism-focused approach, as called for by the post I’m replying to, is to ask what the next step is. And when the answer isn’t yet actionable, break that down further until it is. My suggestion was intended to be one small step of many, that I haven’t seen discussed much as a useful next step.
I think neither. Or rather, I support it, but that’s not quite what I had in mind with the above comment, unless there’s specific stuff they’re doing that I’m not aware of. (Which is entirely possible; I’m following this work only loosely, and not in detail. If I’m missing something, I would be very grateful for more specific links to stuff I should be reading. Git links to usable software packages would be great.)
What I’m looking for mostly, at the moment, is software tools that could be put to use. A library, a tutorial, a guide for how to incorporate that library into your training run, and a result of better compliance with voluntary reporting. What I’ve seen so far is mostly high-effort investigative reports and red-teaming efforts.
Best practices around how to evaluate models and high-effort things you can do while making them are also great. But I’m specifically looking for tools that enable low effort compliance and reporting options while people are doing the same stuff they otherwise would be. I think that would complement the suggestions for high-effort best practices.
The output I’d like to see is things like machine-parseable quantification of flops used to generate a model, such that a derivative model would specify both total and marginal flops used to create it.
One thing I’d like to see more of: attempts at voluntary compliance with proposed plans, and libraries and tools to support that.
I’ve seen suggestions to limit the compute power used on large training runs. Sounds great; might or might not be the answer, but if folks want to give it a try, let’s help them. Where are the libraries that make it super easy to report the compute power used on a training run? To show a Merkle tree of what other models or input data that training run depends on? (Or, if extinction risk isn’t your highest priority, to report which media by which people got incorporated, and what licenses it was used under?) How do those libraries support reporting by open-source efforts, and incremental reporting?
What if the plan is alarm bells and shutdowns of concerning training runs? Or you’re worried about model exfiltration by spies or rogue employees? Are there tools that make it easy to report what steps you’re taking to prevent that? That make it easy to provide good security against those threat models? Where’s the best practices guide?
We don’t have a complete answer. But we have some partial answers, or steps that might move in the right direction. And right now actually taking those next steps, for marginal people kinda on the fence about how to trade capabilities progress against security and alignment work, looks like it’s hard. Or at least harder than I can imagine it being.
(On a related note, I think the intersection of security and alignment is a fruitful area to apply more effort.)
Aren’t the other used cars available nearby, and the potential other buyers should you walk away, relevant to that negotiation?
This was fantastic; thank you! I still haven’t quite figured it out, I’ll definitely have to watch it a second time (or at least some parts of it).
I think some sort of improved interface for your math annotations and diagrams would be a big benefit, whether that’s a drawing tablet or typing out some LaTeX or something else.
I think the section on induction heads and how they work could have used a bit more depth. Maybe a couple more examples, maybe some additional demos of how to play around with PySvelte, maybe something else. That’s the section I had the most trouble following.
You mentioned a couple additional papers in the video; having links in the description would be handy. I suspect I can find them easily enough as it is, though.
Yes, if Omega accurately simulates me and wants me to be wrong, Omega wins. But why do I need to get the answer exactly “right”? What does it matter if I’m slightly off?
This would be a (very slightly) more interesting problem if Omega was offering a bet or a reward and my goal was to maximize reward or utility or whatever. It sure looks like for this setup, combined with a non-adversarial reward schedule, I can get arbitrarily close to maximizing the reward.
Have you considered looking at the old Foresight Exchange / Ideosphere data? It should be available, and it’s old enough that there might be a useful number of long-term forecasts there.
http://www.ideosphere.com/