Related: 16 types of useful predictions
damiensnyder
it’s not
I don’t understand why “scheming AI exfiltrates itself” and “scheming AI hacks its datacenter” are considered to be rogue deployments. Wouldn’t the unmonitored deployments in those cases be catastrophes themselves, caused by a monitored, not-yet-roguely-deployed AI?
Thanks for writing this! I have a few questions, to make sure I’m understanding the architecture correctly:
So, essentially, the above pulls a transformation of the current state, fed through an action-conditioned dynamics network, close to a transformation from the future state.
Is this transformation an image augmentation, like you mention regarding SimSiam? Is the transformation the same for both states? And is the dynamics network also trained in this step?
Is the representation network trained in the same operation that trains the state-value and action-value functions? If not, what is to stop the representation function from being extremely similar regardless of the actual state?
The third technique, the one where older episodes have fewer timesteps before using the state-value function to truncate the rewards, makes me wonder if there has been any research where the final policy “grades” the quality of previous actions. It seems like that could shed some light on why this sort of technique works.
This one may backfire....
If you read “Preventable disease kills thousand daily” five days in a row, why do I buy the newspaper on the 6th day?
You don’t :) I write:
The problem with such a newspaper is that they would go out of business. After all, if the headline has been the same for the last month, even if it is the most important action item in the world, people will stop learning anything from it.
However, by bringing up that extreme example, I approach the question of whether it makes sense to move on from news stories just because they are no longer novel—after all, the problem has not gone away by the time you stop printing about it. It’s not clear that repeating important information would create more political action on it, but such a strategy is worth pondering (in my opinion).
As far as the headline go printing the first headline in your post everyday would be highly misleading. You can argue that Flint’s water is not clean but that doesn’t change the fact that it’s massively more clean then it was two decades ago. A newspaper who just reports “it’s not clean” in the same way every year would do a massive disservice to it’s readers.
When I found that article, I also found several newer articles about how Flint’s water is clean now. The headline I chose was from 2019. I just chose Flint because it was the only event I could think of where I remembered seeing headlines like that.
If someone shared a bad article with me so I would contribute to their refund, I think I would not like them as much afterward :P
One way to keep people from sharing bad content is to display the proportion of previous viewers who paid to the author. This would be a useful way for readers to find good content, too. But the big problem I see is that, unless a reader is scrupulously honest, their payment decision is fairly arbitrary, which might lead them to refund every article (while expecting the same in return).
In my experience, I don’t usually get to choose. I am ineffective and distractable when I am unmotivated, so the vast majority of worthwhile work occurs when I am motivated. Over time this has led me to “ride the tide” of motivation when it is present, and not to force it when it is not. For externally imposed work, pushing work off to deadlines has caused much less frustration than attempting to start early and work steadily. If you are not constrained by motivation, it seems like working slowly and steadily would be preferable for large projects, because it would not be as likely to burn you out. It would also be preferable for jobs, because employers and customers prefer steady, predictable progress. For small projects, and projects that benefit more from “inspiration”, it is possible that short bursts would be preferable instead, because there is less risk of losing the “spark” before finishing.
I may have been unclear in my post, because I agree with a lot of your viewpoints.
If you want to pay your sportsballers a bajillion dollars that money has to come from somewhere. In general, the market has decided that ads are the optimal way of paying for things that people don’t want to pay directly for.
I dedicated only about a sentence and a half to this, because I think it deserves a separate post, but I don’t want to pay my sportsballers a bajillion dollars. I view the fact that people don’t want to pay for entertainment as an indictment on entertainment. Without advertising, the entertainment industry would be much smaller, but it would still be able to present high-quality products. This could mean less expensive sports leagues that don’t have 30 teams and don’t pay every player millions of dollars; movies without expensive special effects, expensive actors, and expensive marketing budgets; and news that doesn’t pay writers that readers wouldn’t pay to read.
The problem I have with the businesses would get by fine without ads is that it is demonstrably untrue. If you look up the biggest companies you will already know their names, their logos, their slogans, any music or sounds they use, etc. You buy from these companies all the time. Advertising works. Today you can quantify the effect of advertising better than ever. Anything on the internet can be tracked, and can be A/B tested.
This was poorly phrased in my post. The specific businesses in existence, specifically the biggest players in the market, stand to lose a lot without advertising. Their brands are often the source of their profitability. But I don’t believe businesses as a whole would suddenly become unprofitable. The biggest shoe companies would lose some cachet and market share, but it would still be profitable to sell shoes. And while lots of shareholders have an interest in dominant brands staying dominant, I do not view it as a net loss to the economy for some companies to lose market share while others take their place.
the branding potentially has some value to the consumers. It gives suppliers more to lose.
Thanks for bringing this up. I wrote this post largely as a way to gather feedback on my perceptions of the role of advertising, and this is a good example of something I wouldn’t have otherwise thought of.
After thinking about it a little more, though, I’m still not convinced advertising solves this problem. For example, even if no detergent brands were well-known, I would buy detergent without worrying about its quality. The store that sells it to me has its own reputation at stake. Even a store that doesn’t advertise has a brand, because of its physical presence and the ability for word-of-mouth to spread in its local area.
Online, there is more reason to stick with established brands. There is less accountability for the platform that directs you to low-quality brands, and scams or low-quality products are thus more common. Trust systems like reviews try to alleviate this, but they are quite often gamed. But in a world without advertising, e-commerce platforms would still have to avoid a reputation for hosting scam or low-quality sellers. I doubt this hypothetical world would have worse product quality across the board, due to the need for platforms to protect their own reputations. (Perhaps more, because brands would have to lean more on their quality and less on their marketing to gain popularity, though I would hardly guarantee it.)
Thanks for this. Some aspects of your story were very similar to my own life in a way that made the message stand out more. Specifically the desire to impress people instead of asking for attention. In general, asking for things triggers an aversion for me that if I rely on others to get something, I must not be good enough to do it myself, and I didn’t really earn it. This causes me to do everything myself when I don’t need to. It also causes me to never ask for things if I might not need them, might not get them, or might not deserve them.
The idea that some “freeloader starter pack” from 2011 embedded this idea in my brain, or even the attitudes of a couple mentors, doesn’t seem right, though. A more resonant explanation to me is a self-narrative that, broadly, “I am good at things.” (This seems vaguely like your own narrative that you would make your dreams happen?) This explains why I hang on the opinion of others particularly about skills where I’m uncertain of my own ability. For example, if someone told me I’m no good at math, I would disbelieve them; no good at juggling, and I would agree; but if they told me I’m no good at acting, it would shake me a lot more.
I have only a broad overview of AI, AI risk, and the topics surrounding it. I’ve encountered the idea that superintelligent AI picking an optimal future is likely to be lured by “siren worlds,” which are bad but are optimized to seem optimal. I vaguely grasped why this might happen, but I didn’t give the theory much credence. (Mainly I thought it was less likely that a bad world could seem optimal than that a good world could seem optimal.) However, I just discovered this comment:
This optimistic conjecture could be tested by looking to see what image maximally triggers a ML classifier. Does the perfect cat, the most cat-like cat according to ML actually look like a cat to us humans? If so, then by analogy the perfect utopia according to ML would also be pretty good. If not...
I have seen images that are “maximum possible score for cat,” and they don’t look like cats. Usually they look like mutant cats with three faces and five eyes. This question and the “siren world” concept seem to relate to each other somewhat. Is the connection between image classifiers and siren worlds:
evidential (i.e., image classifiers should strengthen my belief that superintelligent AI would be lured by siren worlds);
not evidential, but useful as an analogy, because the scenarios share similar causes;
useful as an analogy, though the technical concerns of the two scenarios are essentially unrelated; or
not applicable?
I’m confused by the examples in this post. I don’t dispute that Costco and Berkshire Hathaway are successful, but they are more stable than high-upside. As well, China and Facebook hardly seem to be failing on the whole. The situations are also different between the successful and the upside decay examples, because both Facebook and China are titans compared to Costco. The thesis of this post seems to be that unvirtue causes a loss of weak ties which decreases upside which causes failure. That a lack of virtue loses weak ties seems obvious, and I accept it, and I accept that a loss of weak ties or public favor damages prospects in general. But I don’t see the justification for why losing weak ties takes a cut specifically out of the upside. Even the symptoms China and Facebook experience from losing weak ties don’t seem to be lack of upside, but rather clear downside (pushback from geopolitical rivals, loss of strong employees).
doesn’t add up to 100%—typo?